BERT, but in Italy — image by authorMany of my articles have been focused on BERT — the model that came and dominated the world of natural language processing (NLP) and marked a new age for language models. For those of you that may not have used transformers models (eg what BERT is) before, the process looks a little like this:
![How to Train a BERT Model From Scratch](https://cdn-ak-scissors.b.st-hatena.com/image/square/b70f470ed96b6c7f4151b2b754704c34418a4f9a/height=288;version=1;width=512/https%3A%2F%2Fmiro.medium.com%2Fv2%2Fresize%3Afit%3A1200%2F1%2A-txg2YRJbhQDd8Vrh-dH2g.png)