By Evan Miller July 24, 2023 About which one cannot speak, one must pass over in silence. –Wittgenstein Do you see the off-by-one error in this formula? \[ \textrm{Attention}(Q, K, V) = \textrm{softmax}\left(\frac{QK^T}{\sqrt{d}}\right)V \] The attention formula is the central equation of modern AI, but there’s a bug in it that has been driving me nuts the last week. I tried writing a serious-look
![Attention Is Off By One](https://cdn-ak-scissors.b.st-hatena.com/image/square/6d3c2bbf03457825f4962fef9350d8014c44d4ec/height=288;version=1;width=512/https%3A%2F%2Fwww.evanmiller.org%2Fimages%2Fpreviews%2Fattention-is-off-by-one.png)