This AI just beat humans at using a computer
Agent S3 from Simular scored 72.6% on real computer tasks — humans scored 72%. It clicks, types, and browses better than you do, and it's free.
An AI just outperformed humans at the most mundane task imaginable: using a computer. Agent S3, built by startup Simular, scored 72.6% on OSWorld — a benchmark that tests whether software can open apps, fill in forms, navigate websites, and manage files the way a real person would. The human baseline? 72%.
It's the first time any AI has crossed that line. And unlike most AI breakthroughs locked behind paywalls, Agent S3 is open source with 10,400+ GitHub stars — anyone can install it right now.
What Agent S3 actually does
Think of it as a digital employee that sees your screen and controls your mouse and keyboard. You tell it what you want done — "book a flight to Tokyo for next Tuesday," "fill in this spreadsheet from that PDF," "find the cheapest option on three websites" — and it figures out the clicks, scrolls, and keystrokes to get there.
It works across three platforms: Windows, macOS, and Linux desktops, plus web browsers and Android phones. No other AI agent tops benchmarks across all three simultaneously.
• Desktop tasks (OSWorld): 72.6% — first to beat the ~72% human baseline
• Browser tasks (WebVoyager): 90.1% — near-perfect web navigation
• Smartphone tasks (AndroidWorld): 71.6% — handles mobile apps too
• Windows tasks (WindowsAgentArena): 56.6% — still improving
How it works — no PhD required
Agent S3 uses two AI models working together. A "grounding" model (think of it as the AI's eyes) looks at your screen and identifies every button, menu, and text field. A main reasoning model (the brain) decides what to click and type to complete your task.
The secret sauce is something called "Behavior Best-of-N" — the AI tries multiple approaches to each task and picks the one that works best. It's like having an assistant who drafts three emails and sends the best one, except it does this for every click.
The $21.5M bet behind it
Simular raised $21.5 million in December 2025 to build what they call "the autonomous computer company." Their vision: AI should handle the repetitive screen work so humans don't have to.
The company already has 12,600+ users on their commercial product, Sai — a cloud-based AI co-worker that runs on a virtual desktop. The open-source Agent S3 is the research engine underneath it.
Simular's work has been featured in Wired, MIT Technology Review, and won Best Paper at ICLR 2025 (one of the top AI research conferences).
Who should care — and who should worry
Try it yourself
Agent S3 runs on any computer with Python installed:
pip install gui-agents
You'll also need Tesseract (a free text-recognition tool) and an API key from OpenAI or Anthropic for the reasoning model. The recommended setup pairs GPT-5 with a smaller grounding model called UI-TARS that identifies screen elements.
Full setup instructions are on the Agent-S GitHub page.
The bigger picture
Twelve months ago, AI agents could barely navigate a website without getting lost. Now one has passed the human bar for general computer use. The gap between "AI that writes text" and "AI that does your job" just closed by a significant margin.
Simular's commercial product, Sai, already runs on a private cloud desktop with security guardrails — meaning businesses can deploy it without giving AI direct access to their machines. Expect enterprise adoption to accelerate now that the research has proven superhuman capability.
The question is no longer whether AI can use a computer like a human. It's how fast the 72.6% climbs to 90%.
Related Content — Get Started with Easy Claude Code | Free Learning Guides | More AI News
Stay updated on AI news
Simple explanations of the latest AI developments