How to train very deep networks without them falling apart
orig. “Deep Residual Learning for Image Recognition” · Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
Stacking more layers used to make image models worse, not better. This paper found a simple trick that fixed it.
Researchers wanted deeper networks because depth usually helps a model see more. Past a point, though, adding layers made accuracy drop even on the training data, which made no sense. The fix was the residual connection, a shortcut that lets a layer pass its input straight through and only learn the small change on top. With these shortcuts the team trained networks over a hundred layers deep and won the 2015 ImageNet contest by a wide margin.
Residual connections are now in almost every large model, including the ones behind modern language and image tools. The idea is small and easy to add, which is part of why it spread so fast. It is a clean example of how one fix can unlock a whole direction of research.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Microsoft Research
We write original plain-language summaries and link to the source. We never republish the paper.
Paste it and we'll explain it even more simply.
Pass all three to earn the “read & understood” stamp (+10 pts).
Pass quizzes and leave notes to climb your chapter's board. No chapters are running yet, so this one is wide open.
Start a chapter to compete →