Marginaliadaily

Teaching a small model to copy a big one

orig. “Distilling the Knowledge in a Neural Network” · Geoffrey Hinton, Oriol Vinyals, Jeff Dean

Efficient AI Intermediate 3 min read Written, reviewed by Marginalia Editorial
In the margin
Area
Efficient AI, Making models smaller, faster, and cheaper to run so AI can work on phones and modest hardware.

Big models are accurate but slow. This paper shows how to pour most of that skill into a small, fast model.

The trick is to train a small student model to copy the outputs of a large teacher model, not just the right answers but how confident the teacher is across all the options. Those soft signals carry extra hints that help the student learn more than it could from the labels alone. The result is a smaller model that runs faster and cheaper while keeping much of the accuracy. This is called distillation.

Distillation is a big reason capable AI can run on phones and in browsers. As models get larger, shrinking them down without losing much skill becomes more valuable. It is one of the core tools for making AI practical and affordable.

Source

Geoffrey Hinton, Oriol Vinyals, Jeff Dean, Google

We write original plain-language summaries and link to the source. We never republish the paper.

Paste it and we'll explain it even more simply.

Pass all three to earn the “read & understood” stamp (+10 pts).

Member notes Sign in ↗
ME Marginalia Editorial TEAM
We read the full paper and rewrote it in plain language. Leave your own note below.
Leaderboard · this week

Pass quizzes and leave notes to climb your chapter's board. No chapters are running yet, so this one is wide open.

Start a chapter to compete →
A Smarter Way for AI to Understand Videos
Intermediate
Making Sense of Long Videos
Intermediate