• Parent [none/use name]@hexbear.net
    link
    fedilink
    English
    arrow-up
    20
    ·
    6 months ago

    Well, addition is built into the instruction set of any CPU, so it only takes one operation. On the other hand, one evaluation of a neural net involves several repeated matrix-vector multiplies followed by the application of a nonlinear “activation function”. Matrix-vector multiply for a square matrix will take 2020=400 multiply operations and about 2019 addition operations for a 20-dimensional input. So we’ll say maybe on the order of 1,000-10,000 times more operations depending on how many layers?

    • hexaflexagonbear [he/him]@hexbear.netOP
      link
      fedilink
      English
      arrow-up
      9
      ·
      6 months ago

      This is up to 200 digit numbers, so you’d actually need to use a custom implementation for representing the integers and software addition but then a naive algorithm would still be like… 200 operations. Could probably drastically reduce that as well.

      • invalidusernamelol [he/him]@hexbear.net
        link
        fedilink
        English
        arrow-up
        2
        ·
        6 months ago

        200bit numbers only require like 10 registers. X86-64 has 16 general purpose registers so doing operations with 200 digit numbers should hypothetically only require 20 loads and 10 multiplies. So a well written bit of code could do it in under 100 ops (probably under 50). So assuming this LLM implementation is running on a big server, it’s probably doing the same calculation, less accurately, with some exponentially larger amount of operations.