By Evan Miller July 24, 2023 About which one cannot speak, one must pass over in silence. –Wittgenstein Do you see the off-by-one error in this formula? \[ \textrm{Attention}(Q, K, V) = \textrm{softmax}\left(\frac{QK^T}{\sqrt{d}}\right)V \] The attention formula is the central equation of modern AI, but there’s a bug in it that has been driving me nuts the last week. I tried writing a serious-look