A practical checklist of ways AI can go wrong

orig. “Concrete Problems in AI Safety” · Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mane

Alignment Beginner 4 min read Written, reviewed by Marginalia Editorial

In the margin

Area

Alignment, Making sure AI systems actually do what people intend, even as they get more capable.

Instead of distant doom, this paper lists the small, concrete ways an AI can do the wrong thing while looking like it is working.

What's going on

The authors set aside science-fiction worries and focus on everyday failure modes. Examples include a cleaning robot that knocks things over to finish faster, a system that games its own reward, or one that behaves well in testing but not in the real world. For each problem they suggest research directions to study it. The point is to make safety a normal engineering topic with clear, testable questions.

Why it matters

This paper helped turn AI safety from a vague worry into a working research field. Many of the problems it named are now active areas, especially as models are given more freedom to act. It is a good starting point for anyone who wants to understand what alignment actually means in practice.

Source

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mane, Google Brain and OpenAI

View on arXiv ↗ PDF ↗

We write original plain-language summaries and link to the source. We never republish the paper.

Still fuzzy on a sentence?

Paste it and we'll explain it even more simply.

Test your understanding

Pass all three to earn the “read & understood” stamp (+10 pts).

Member notes Sign in ↗

ME Marginalia Editorial TEAM

We read the full paper and rewrote it in plain language. Leave your own note below.

Leaderboard · this week

Pass quizzes and leave notes to climb your chapter's board. No chapters are running yet, so this one is wide open.

Start a chapter to compete →