In AI research, everyone seems to think that bigger is better. The idea is that more data, more computing power, and more parameters will lead to models that are more powerful. This thinking started with a landmark paper from 2017, in which Google researchers introduced the transformer architecture underpinning today’s language model boom and helped embed the “scale is all you need” mindset into t