Marginaliadaily

The model that taught computers to read both directions at once

orig. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” · Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

Large Language Models Intermediate 4 min read Written, reviewed by Marginalia Editorial
In the margin
Area
Large Language Models, Models trained on huge amounts of text that can read, write, summarise, and reason in natural language.

For a few years, this one model quietly powered a huge chunk of Google Search and a wave of language tools.

Earlier models read text left to right. BERT reads the whole sentence at once, looking both ways, which helps it understand how words depend on each other. It learns by playing fill-in-the-blank on huge amounts of text: hide a word, guess it from the context. After that general training, it can be quickly adapted to specific jobs like answering questions or judging sentiment.

BERT made the now-standard recipe popular: train one big model on lots of text, then fine-tune it for each task. That recipe is behind most modern language tools. It also went straight into real products, including search, soon after release.

Source

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, Google AI Language

We write original plain-language summaries and link to the source. We never republish the paper.

Paste it and we'll explain it even more simply.

Pass all three to earn the “read & understood” stamp (+10 pts).

Member notes Sign in ↗
ME Marginalia Editorial TEAM
We read the full paper and rewrote it in plain language. Leave your own note below.
Leaderboard · this week

Pass quizzes and leave notes to climb your chapter's board. No chapters are running yet, so this one is wide open.

Start a chapter to compete →
Can AI Remember What It Saw Earlier?
Advanced
The idea that taught machines to read: meet the Transformer
Beginner