The model that taught computers to read both directions at once
orig. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” · Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
For a few years, this one model quietly powered a huge chunk of Google Search and a wave of language tools.
Earlier models read text left to right. BERT reads the whole sentence at once, looking both ways, which helps it understand how words depend on each other. It learns by playing fill-in-the-blank on huge amounts of text: hide a word, guess it from the context. After that general training, it can be quickly adapted to specific jobs like answering questions or judging sentiment.
BERT made the now-standard recipe popular: train one big model on lots of text, then fine-tune it for each task. That recipe is behind most modern language tools. It also went straight into real products, including search, soon after release.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, Google AI Language
We write original plain-language summaries and link to the source. We never republish the paper.
Paste it and we'll explain it even more simply.
Pass all three to earn the “read & understood” stamp (+10 pts).
Pass quizzes and leave notes to climb your chapter's board. No chapters are running yet, so this one is wide open.
Start a chapter to compete →