Let’s focus on the well-known MMLU (opens in new tab) (Measuring Massive Multitask Language Understanding) challenge that was established as a test of general knowledge and reasoning powers of large language models. The complete MMLU benchmark contains tens of thousands of challenge problems of different forms across 57 areas from basic mathematics to United States history, law, computer science,
![Steering at the Frontier: Extending the Power of Prompting - Microsoft Research](https://cdn-ak-scissors.b.st-hatena.com/image/square/af6e40c3299cea67cf6fea43becefc0065f14851/height=288;version=1;width=512/https%3A%2F%2Fwww.microsoft.com%2Fen-us%2Fresearch%2Fuploads%2Fprod%2F2023%2F12%2FSteeering-TWLIFB-1200x627-1.jpg)