Teaching a small model to copy a big one
orig. “Distilling the Knowledge in a Neural Network” · Geoffrey Hinton, Oriol Vinyals, Jeff Dean
Big models are accurate but slow. This paper shows how to pour most of that skill into a small, fast model.
The trick is to train a small student model to copy the outputs of a large teacher model, not just the right answers but how confident the teacher is across all the options. Those soft signals carry extra hints that help the student learn more than it could from the labels alone. The result is a smaller model that runs faster and cheaper while keeping much of the accuracy. This is called distillation.
Distillation is a big reason capable AI can run on phones and in browsers. As models get larger, shrinking them down without losing much skill becomes more valuable. It is one of the core tools for making AI practical and affordable.
Geoffrey Hinton, Oriol Vinyals, Jeff Dean, Google
We write original plain-language summaries and link to the source. We never republish the paper.
Paste it and we'll explain it even more simply.
Pass all three to earn the “read & understood” stamp (+10 pts).
Pass quizzes and leave notes to climb your chapter's board. No chapters are running yet, so this one is wide open.
Start a chapter to compete →