Notion AI Agents: 4 Rebuilds, 100+ Tools, 3-Month Free Trial
Notion rebuilt custom AI agents 4 times before shipping. Claude Sonnet 3.6 was the unlock. Now: 100+ tools, free 3-month trial, no coding needed.
Notion's AI team rebuilt their custom agents feature four times before it was good enough to ship. The problem wasn't engineering skill — it was waiting for AI models to mature enough that background task execution (running automatically without a human watching every step) actually worked reliably. That wait ended in early 2025. The result is now live: 100+ tools, enterprise-grade permissions, and a 3-month free trial. Here's what three years of failures actually teaches you about shipping AI products — and why the final version is worth testing.
Three Years of AI Agent Failures Nobody Talks About
Notion first attempted to build AI agents in 2022 — well before ChatGPT launched and made the category mainstream. The early attempts collapsed against the same four structural problems every time:
- No standardized tool-calling — models couldn't reliably invoke external functions or services, so agents couldn't take real-world actions
- Context windows too short — context windows (the amount of text an AI can hold in memory at once) were so limited that agents forgot earlier steps mid-task
- Models too unreliable — output quality varied enough that background execution without supervision was impossible to trust
- Excessive complexity exposed — early designs gave the model too many options at once, overwhelming it and producing unpredictable outputs
Simon Last, Notion's head of AI, describes the period plainly: "The models are just too dumb and the context thing was also way too short. We just kind of banged our head against it for a long time...there was always like sort of glimmers that it was working, but it never felt quite robust enough."
Most companies facing the same wall either shipped something fragile or shelved the feature quietly. Notion chose a third option: keep rebuilding, and wait for models to catch up. Between 2022 and early 2025, the team went through four to five complete rebuilds — scrapping months of work each time without treating it as organizational failure.
The Two Unlocks: Claude Sonnet 3.6 and an Unexpected Hire
The technical inflection point came in early 2025. Simon Last identifies it precisely: "The big unlock was probably like Sonnet 3.6 or seven, uh, early last year. And that's when we started working on our agent."
Claude Sonnet 3.6 (an AI model built by Anthropic, designed for complex multi-step tasks requiring reliable instruction-following across long, branching sequences) gave Notion's team three things that had been missing for three years: stable tool-calling, extended context handling, and output quality consistent enough to trust in background execution. When a model correctly completes a 20-step task unattended, the entire product design space opens up.
The human story running in parallel matters equally. Simon Last — the engineer who had been rebuilding agents since 2022 — needed a vacation. Notion hired Sarah Sachs to manage the AI engineering team during his absence. She didn't leave when he returned. Instead, she helped reshape how the team operates: low-ego culture where deleting your own work is standard practice, decisions made via working demos rather than slide decks, and objectives set collectively rather than handed down from above.
Sachs on the pattern: "We ship things slowly...it's quite nice to remind yourself all the work you did because we do have a habit of being two or three milestones ahead." The cultural willingness to scrap work — and to wait rather than ship something broken — is what made four rebuilds possible without the team fracturing.
How Notion's Custom AI Agents Actually Work
The shipped product is considerably more sophisticated than typical "agent builder" tools (visual drag-and-drop platforms that chain AI steps together in a linear flow without enterprise-grade access controls). Notion's agents operate inside a permission-aware environment — which turns out to be the hardest unsolved problem in enterprise agent design.
The concrete challenge: a Notion database shared company-wide contains data that an agent running in a private 5-person Slack channel shouldn't expose to everyone. Building an agent that automatically respects who can see what — across shared documents, databases, and channels — without requiring manual permission configuration each time, took multiple product iterations to get right.
The current version includes:
- 100+ tools the agent can invoke — searching Notion databases, drafting pages, pulling meeting notes, running web searches, performing calculations
- Manager agent architecture — a top-level agent can direct specialized sub-agents, each focused on a specific task domain
- Self-inspection and self-editing — agents can review their own failure logs and rewrite their operating instructions within guardrails (set boundaries that prevent agents from taking unauthorized actions)
- Native memory via Notion pages and databases, rather than external vector stores (databases that convert text into searchable numbers AI can query)
- Credits-based pricing — a single abstracted unit covering token counts (the per-word cost of running AI), model selection, and web search fees, so users never need to understand the underlying pricing mechanics
Access requires no coding: open Notion → Settings → Custom Agents. The 3-month free trial is currently active for new users at launch. If you're new to AI automation tools, explore our beginner guides to AI workflow automation before diving in.
The 30% Eval Rule — A Competitive Advantage Disguised as Humility
One engineering detail worth knowing: Notion deliberately designs its "frontier evals" (benchmark tests measuring what AI may be capable of in the future, not just what it can do today) to pass only about 30% of the time. The logic is counterintuitive — if your hardest tests are passing, your tests aren't hard enough. By intentionally failing 70% of forward-looking tests, Notion gets early signal on which model capabilities are emerging before competitors detect them. A team willing to define success at 30% is a team designed to see around corners.
The "Software Factory" — Where AI Automation Is Actually Heading
Simon Last's most ambitious framing goes well beyond productivity features. He describes coding agents as "the kernel of AGI (artificial general intelligence — the theoretical threshold at which AI can perform any intellectual task a human can do)": "Everything is a coding agent...the exciting thing about that is sort of your agent can bootstrap its own software and capabilities and actually debug and maintain them."
The vision: once an agent can write code, test it, debug failures, and maintain a codebase autonomously, you have a software factory. A manager agent specs a feature. A coding agent builds it. A testing agent verifies it. A review agent audits quality. All of them share the codebase, generating new capabilities as they go — without human intervention at each handoff.
Notion isn't offering this as a user-facing product yet. But the philosophy shapes current decisions. One telling signal: Notion has explicitly decided not to train its own foundation model (the large base AI system that all other AI features run on top of). They're betting Claude and other third-party models will improve fast enough that proprietary training infrastructure would be wasted investment. So far, that bet has been validated — Claude Sonnet 3.6 was the unlock that made the whole product possible.
Notion AI Agents Free Trial: What to Test First
For non-technical Notion users, the practical takeaway is direct: you can now set up an AI agent inside Notion that runs tasks automatically — searching your workspace, summarizing meeting notes, pulling data from databases, drafting pages — without writing a single line of code or managing any AI infrastructure.
The 3-month free trial removes the cost barrier entirely. The credits model means no surprise bills tied to AI pricing you don't understand. The permission system means sharing an agent with your team won't accidentally expose data people shouldn't see.
If earlier AI agents felt fragile or unpredictable, this version reflects four rebuilds and three years of targeted fixes aimed at exactly those problems. Open Notion → Settings → Custom Agents. Run it against one repetitive weekly task — a search, a summary, a data pull, or a draft. That's enough to judge whether it saves real time. For a step-by-step walkthrough on setting up your first automated workflow, visit our automation guides.
Related Content — Get Started | Guides | More News
Stay updated on AI news
Simple explanations of the latest AI developments