The idea that taught machines to read: meet the Transformer

orig. “Attention Is All You Need” · Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin

Large Language Models Beginner 4 min read AI-assisted, reviewed by Marginalia Editorial

In the margin

Area

Large Language Models, Models trained on huge amounts of text that can read, write, summarise, and reason in natural language.

we keep the jargon out here so the middle stays easy to read

This could make your translation apps much better.

What's going on

Imagine you're trying to translate a sentence from English to French. You might look up each word in a dictionary. But that's not enough. You need to understand how the words fit together. That's what this paper is about.

The researchers wanted to make computers better at understanding sentences. They built a new kind of neural network (a computer program that learns from examples). This neural network is called a transformer (a kind of AI model that learns by weighing which words matter most to each other). The transformer looks at whole sentences at once. It figures out which words are most important to each other. This is called attention (a way for the AI to focus on the most important parts of the input).

Other AI models look at words one at a time. They remember what they saw before. But this is slow. The transformer is different. It looks at all the words at once. This makes it much faster. The researchers tested the transformer on two language tasks. One was translating English to German. The other was translating English to French. The transformer did better than other models. It got a score of 28.4 on the German task. This was better than the old best score. On the French task, it got a score of 41.8. This was the best single-model score ever.

The transformer also works on other tasks. The researchers tried it on a task called English constituency parsing. This is like breaking down a sentence into its parts. The transformer did well on this task too. It worked with both lots of training data and little training data.

Why it matters

Your phone's translation app could get much better. It could translate sentences faster and more accurately.
Doctors could use it to translate medical notes quickly (medical notes are full of specialized terms).
Students could use it to translate school books into their own language.
Travelers could use it to talk to people in other countries. It could make trips smoother and more enjoyable.
News websites could use it to translate articles into many languages quickly. This could help spread information faster.
It could make computers understand us better. This could help with voice assistants and chatbots.

Source

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, Google Brain / Google Research

View on arXiv ↗ PDF ↗

We write original plain-language summaries and link to the source. We never republish the paper.

Still fuzzy on a sentence?

Paste it and we'll explain it even more simply.

Test your understanding

Pass all three to earn the “read & understood” stamp (+10 pts).

Member notes Sign in ↗

ME Marginalia Editorial TEAM

We read the full paper and rewrote it in plain language. Leave your own note below.

PD Priya D. MEMBER

The "it" example is what finally made attention click for me. The model just learns what each word is pointing at.

ML Marcus L. MEMBER

If you have ever used Google Translate, you have used this paper. Still wild that it is only from 2017.

SB Sam B. MEMBER

Quick heads up: "attention" here is not the human kind. It is the model scoring how much each word should look at the others.

Leaderboard · this week

your chapter

↑ pass the quiz and watch yourself climb. it's stupidly addictive.