Marginaliadaily

Datasets

Open data, built by the community.

Everything we make is openly licensed and documented. These datasets are built as a by-product of reading and annotating papers together, free to download, study, and build on.

Text / NLP CC BY 4.0

Plain-language paper explainers

2 entries

Original arXiv titles paired with a plain-language explainer at three depths (ELI15 · Student · Researcher), plus difficulty and AI-for-good labels.

How it's made
Drafted by Workers AI from the abstract, then reviewed and corrected by a human editor before publishing.
Browse the explainers Hugging Face mirror · in progress
Annotations CC BY 4.0

Community margin notes

7 approved notes

Plain-language annotations readers left on specific sentences, anchored to the paper they explain. The raw material of a 'how people explain research' dataset.

How it's made
Written by members, each one read and approved by an editor before it goes live.
See annotated papers Hugging Face mirror · in progress
Taxonomy CC BY 4.0

AI research direction map

19 directions

A curated, hierarchical taxonomy of AI research directions with plain-language descriptions and momentum (recent-activity) scores.

How it's made
Hand-curated by editors and kept fresh by the nightly ingest job.
Open the map Hugging Face mirror · in progress
Text / NLP CC BY 4.0

Plain-language AI glossary

8 terms

AI terms with short, jargon-free definitions, the human-approved glossary that grows as new papers are explained.

How it's made
Suggested by the pipeline and by members, approved by an editor.
Read the glossary Hugging Face mirror · in progress

We follow the "Datasheets for Datasets" standard so anyone can judge whether a dataset is right for their work.

Why was it collected?
To make AI research legible to newcomers and to study the gap between academic and plain language.
How was it collected?
Papers are sourced from arXiv and Hugging Face Daily Papers; explainers are AI-drafted and human-reviewed; notes are member-written and editor-approved.
How is quality assured?
Every published item passes human editorial review. We never present unreviewed AI output as verified.
Who are the contributors?
Marginalia members and editors. Contributors are credited; no personal data beyond a chosen display name is included.
What are the limits?
The collection skews toward beginner-friendly and social-good work, and toward recent papers. It is not a representative sample of all AI research.
How can it be used?
Freely, under CC BY 4.0, with attribution to Marginalia. Read more in how we work.

Want to help build these? Contribute a note or a label →