We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with
![BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://cdn-ak-scissors.b.st-hatena.com/image/square/cb889d0741300b4b7ebd2c5cf51dfaab7ba41df9/height=288;version=1;width=512/https%3A%2F%2Fstatic.arxiv.org%2Ficons%2Ftwitter%2Farxiv-logo-twitter-square.png)