A practical checklist of ways AI can go wrong
orig. “Concrete Problems in AI Safety” · Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mane
Instead of distant doom, this paper lists the small, concrete ways an AI can do the wrong thing while looking like it is working.
The authors set aside science-fiction worries and focus on everyday failure modes. Examples include a cleaning robot that knocks things over to finish faster, a system that games its own reward, or one that behaves well in testing but not in the real world. For each problem they suggest research directions to study it. The point is to make safety a normal engineering topic with clear, testable questions.
This paper helped turn AI safety from a vague worry into a working research field. Many of the problems it named are now active areas, especially as models are given more freedom to act. It is a good starting point for anyone who wants to understand what alignment actually means in practice.
Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mane, Google Brain and OpenAI
We write original plain-language summaries and link to the source. We never republish the paper.
Paste it and we'll explain it even more simply.
Pass all three to earn the “read & understood” stamp (+10 pts).
Pass quizzes and leave notes to climb your chapter's board. No chapters are running yet, so this one is wide open.
Start a chapter to compete →→