Marginaliadaily

How to train very deep networks without them falling apart

orig. “Deep Residual Learning for Image Recognition” · Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

Deep Learning Beginner 4 min read Written, reviewed by Marginalia Editorial
In the margin
Area
Deep Learning, A family of methods that stack many simple layers so a model can learn rich patterns — the engine behind most modern AI.

Stacking more layers used to make image models worse, not better. This paper found a simple trick that fixed it.

Researchers wanted deeper networks because depth usually helps a model see more. Past a point, though, adding layers made accuracy drop even on the training data, which made no sense. The fix was the residual connection, a shortcut that lets a layer pass its input straight through and only learn the small change on top. With these shortcuts the team trained networks over a hundred layers deep and won the 2015 ImageNet contest by a wide margin.

Residual connections are now in almost every large model, including the ones behind modern language and image tools. The idea is small and easy to add, which is part of why it spread so fast. It is a clean example of how one fix can unlock a whole direction of research.

Source

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Microsoft Research

We write original plain-language summaries and link to the source. We never republish the paper.

Paste it and we'll explain it even more simply.

Pass all three to earn the “read & understood” stamp (+10 pts).

Member notes Sign in ↗
ME Marginalia Editorial TEAM
We read the full paper and rewrote it in plain language. Leave your own note below.
Leaderboard · this week

Pass quizzes and leave notes to climb your chapter's board. No chapters are running yet, so this one is wide open.

Start a chapter to compete →
A Smarter Way for AI to Understand Videos
Intermediate
Creating 3D Optical Illusions with AI
Intermediate
Making Sense of Long Videos
Intermediate