We currently don't understand how to make sense of the neural activity within language models. Today, we are sharing improved methods for finding a large number of "features"—patterns of activity that we hope are human interpretable. Our methods scale better than existing work, and we use them to find 16 million features in GPT-4. We are sharing a paper(opens in a new window), code(opens in a new
![Extracting Concepts from GPT-4](https://cdn-ak-scissors.b.st-hatena.com/image/square/d937db12044ffeb5e1e154f5f18c0ceff0797889/height=288;version=1;width=512/https%3A%2F%2Fimages.ctfassets.net%2Fkftzwdyauwt9%2F53G9eNsYjVuqZ885GWG8lB%2Fcba66af1ab4fa283d8686e66f20a9a1d%2Fsparse-autoencoders-cover.png%3Fw%3D1600%26h%3D900%26fit%3Dfill)