How to train very deep networks without them falling apart

orig. “Deep Residual Learning for Image Recognition” · Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

Deep Learning Beginner 4 min read Written, reviewed by Marginalia Editorial

In the margin

Area

Deep Learning, A family of methods that stack many simple layers so a model can learn rich patterns — the engine behind most modern AI.

Stacking more layers used to make image models worse, not better. This paper found a simple trick that fixed it.

What's going on

Researchers wanted deeper networks because depth usually helps a model see more. Past a point, though, adding layers made accuracy drop even on the training data, which made no sense. The fix was the residual connection, a shortcut that lets a layer pass its input straight through and only learn the small change on top. With these shortcuts the team trained networks over a hundred layers deep and won the 2015 ImageNet contest by a wide margin.

Why it matters

Residual connections are now in almost every large model, including the ones behind modern language and image tools. The idea is small and easy to add, which is part of why it spread so fast. It is a clean example of how one fix can unlock a whole direction of research.

Source

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Microsoft Research

View on arXiv ↗ PDF ↗

We write original plain-language summaries and link to the source. We never republish the paper.

Still fuzzy on a sentence?

Paste it and we'll explain it even more simply.

Test your understanding

Pass all three to earn the “read & understood” stamp (+10 pts).

Member notes Sign in ↗

ME Marginalia Editorial TEAM

We read the full paper and rewrote it in plain language. Leave your own note below.

Leaderboard · this week

Pass quizzes and leave notes to climb your chapter's board. No chapters are running yet, so this one is wide open.

Start a chapter to compete →