An AI just wrote its first paper — and passed peer review
Sakana AI's AI Scientist v2 became the first AI system to independently write a research paper that passed academic peer review at ICLR 2025 workshop.
For the First Time, a Machine Did Science — and Experts Approved It
Something happened in early 2025 that researchers have been debating ever since: an AI system wrote a research paper entirely on its own, submitted it to a scientific workshop, and received enough positive reviewer scores to be accepted. No human wrote the words. No human designed the experiments. The AI did it all.
The system is called The AI Scientist-v2, built by Sakana AI — a Tokyo-based research lab founded by David Ha, the former head of research at Google Brain. What they built isn't just a writing assistant. It's a fully autonomous research machine.
The accepted paper was titled "Compositional Regularization: Unexpected Obstacles in Enhancing Neural Network Generalization." It explored regularization (a technique used to stop AI systems from memorizing training data instead of actually learning patterns) and reported negative findings — essentially, the AI discovered that some widely tried approaches don't work as expected.
How the AI Scientist v2 Actually Works
Unlike a chatbot that answers questions, the AI Scientist v2 operates in a cycle that closely mirrors how human researchers work — but runs automatically, around the clock, at a fraction of the cost.
The key innovation in this version is what engineers call agentic tree search (a method where the AI explores many different research directions at once, like branches on a tree, and picks the most promising ones to pursue further). Think of it like having hundreds of research assistants brainstorm ideas simultaneously, then selecting the best one to actually investigate.
Here is the step-by-step process the AI follows on every run:
- Step 1 — Idea generation: The AI browses recent academic papers on Semantic Scholar (a scientific paper database) and invents a new research question worth investigating.
- Step 2 — Experiment design and coding: It writes computer code to run experiments that test its hypothesis, handling its own debugging if something breaks.
- Step 3 — Running experiments: It executes the code, collects the results, and analyzes what they mean — all automatically.
- Step 4 — Creating charts and figures: A vision-based AI component reviews each generated chart and suggests improvements, iterating until the visuals meet academic quality standards.
- Step 5 — Writing the paper: The system writes a full academic paper in the standard scientific format, complete with an abstract, methodology, results, discussion, and references.
The entire process — from blank slate to submitted paper — costs roughly $20 to $30 in compute fees and takes several hours. For comparison, a typical PhD student spends months or years on a single paper.
The project is open-source and available on GitHub, where it has accumulated over 3,000 stars (a measure of how many developers have bookmarked or endorsed the project). The code, all three submitted papers, and the human reviewer feedback are all publicly available for anyone to inspect.
The Peer Review Results — and Why They're Contested
Sakana AI submitted 3 papers to the ICLR 2025 workshop titled "I Can't Believe It's Not Better" — a real, peer-reviewed (meaning other scientists checked the work before it could be approved) workshop held inside ICLR (the International Conference on Learning Representations, one of the world's most prestigious machine learning conferences). Only 1 of the 3 papers received scores high enough to be accepted.
The accepted paper received reviewer scores of 6, 7, and 6 — averaging 6.33. The typical acceptance threshold at this workshop was a score of 6, meaning the AI's paper cleared the bar by a small but meaningful margin.
Importantly, this experiment was conducted transparently. Sakana AI worked directly with ICLR leadership and workshop organizers, with full ethical review board clearance. The reviewers knew in advance that some papers in the pool might be AI-generated — though they did not know which specific papers came from the AI.
However, critics have raised important nuances:
- Workshop vs. main conference: This was a workshop track, which typically accepts 60–70% of submissions, compared to 20–30% for main conference papers. The bar is meaningfully lower.
- Paper quality concerns: Sakana AI's own internal reviewers concluded the papers "did not pass our internal bar" for a main-conference submission. One paper contained citation errors, for instance attributing an LSTM architecture to the wrong researcher.
- Voluntary withdrawal: All three papers were withdrawn after the review process, meaning none actually appeared in the published workshop proceedings.
- Human selection: While the AI wrote everything, human researchers chose which papers to submit — a meaningful form of editorial control.
TechCrunch noted that "while the claim isn't necessarily untrue, there are caveats to note" — a fair summary of the scientific community's mixed reaction.
Why This Milestone Matters Even With the Caveats
Strip away the marketing language, and what remains is still remarkable: an autonomous computer system completed the full scientific research cycle — hypothesis, experiment, analysis, writing — well enough to fool human expert reviewers in a real academic setting.
Previously, the best AI tools could assist with parts of this process: helping write abstracts, suggesting citations, or generating code. The AI Scientist v2 is the first demonstration of an end-to-end autonomous system that does the entire thing without being steered by a human at each step.
The original AI Scientist v1 paper was later published in Nature (a peer-reviewed journal founded in 1869 that publishes only the most significant scientific discoveries), one of the world's most prestigious scientific journals — a sign that the broader research community takes this work seriously even if the workshop milestone comes with asterisks.
For non-researchers, the practical implications are beginning to take shape: drug discovery pipelines that can test thousands of hypotheses overnight, materials science experiments that run autonomously in simulation, or climate modeling research that no longer requires a team of PhD students to operate. The question is no longer whether AI can do science. It's how fast the quality improves — and who gets to decide what counts as real.
Related Content — Get Started with Easy Claude Code | Free Learning Guides | More AI News
Sources
Stay updated on AI news
Simple explanations of the latest AI developments