2026-03-21AI agentcomputer useAgent S3Simularautomationopen source

This AI just beat humans at using a computer

Agent S3 from Simular scored 72.6% on real computer tasks — humans scored 72%. It clicks, types, and browses better than you do, and it's free.

An AI just outperformed humans at the most mundane task imaginable: using a computer. Agent S3, built by startup Simular, scored 72.6% on OSWorld — a benchmark that tests whether software can open apps, fill in forms, navigate websites, and manage files the way a real person would. The human baseline? 72%.

It's the first time any AI has crossed that line. And unlike most AI breakthroughs locked behind paywalls, Agent S3 is open source with 10,400+ GitHub stars — anyone can install it right now.

Simular Agent S3 AI controlling a computer desktop autonomously

What Agent S3 actually does

Think of it as a digital employee that sees your screen and controls your mouse and keyboard. You tell it what you want done — "book a flight to Tokyo for next Tuesday," "fill in this spreadsheet from that PDF," "find the cheapest option on three websites" — and it figures out the clicks, scrolls, and keystrokes to get there.

It works across three platforms: Windows, macOS, and Linux desktops, plus web browsers and Android phones. No other AI agent tops benchmarks across all three simultaneously.

The scorecard — Agent S3 vs. everyone else:

• Desktop tasks (OSWorld): 72.6% — first to beat the ~72% human baseline
• Browser tasks (WebVoyager): 90.1% — near-perfect web navigation
• Smartphone tasks (AndroidWorld): 71.6% — handles mobile apps too
• Windows tasks (WindowsAgentArena): 56.6% — still improving

Agent S3 benchmark results showing human-level performance on OSWorld

How it works — no PhD required

Agent S3 uses two AI models working together. A "grounding" model (think of it as the AI's eyes) looks at your screen and identifies every button, menu, and text field. A main reasoning model (the brain) decides what to click and type to complete your task.

The secret sauce is something called "Behavior Best-of-N" — the AI tries multiple approaches to each task and picks the one that works best. It's like having an assistant who drafts three emails and sends the best one, except it does this for every click.

The $21.5M bet behind it

Simular raised $21.5 million in December 2025 to build what they call "the autonomous computer company." Their vision: AI should handle the repetitive screen work so humans don't have to.

The company already has 12,600+ users on their commercial product, Sai — a cloud-based AI co-worker that runs on a virtual desktop. The open-source Agent S3 is the research engine underneath it.

Simular's work has been featured in Wired, MIT Technology Review, and won Best Paper at ICLR 2025 (one of the top AI research conferences).

Who should care — and who should worry

Office workers and managers: Repetitive computer tasks — data entry, form filling, web research across multiple tabs — are exactly what Agent S3 excels at. If your job involves clicking through the same 15 screens every day, this is the AI that could do it for you.

Developers and automation builders: Agent S3 installs with one command and supports multiple LLM providers (OpenAI, Anthropic, Google). It's MIT-licensed, so you can build commercial products on top of it.

Everyone watching AI replace jobs: When AI can use a computer better than a human, every task that involves a screen is on the table. Data entry, customer support workflows, QA testing, appointment scheduling — the automation wave just got a lot bigger.

Try it yourself

Agent S3 runs on any computer with Python installed:

pip install gui-agents

You'll also need Tesseract (a free text-recognition tool) and an API key from OpenAI or Anthropic for the reasoning model. The recommended setup pairs GPT-5 with a smaller grounding model called UI-TARS that identifies screen elements.

Full setup instructions are on the Agent-S GitHub page.

The bigger picture

Twelve months ago, AI agents could barely navigate a website without getting lost. Now one has passed the human bar for general computer use. The gap between "AI that writes text" and "AI that does your job" just closed by a significant margin.

Simular's commercial product, Sai, already runs on a private cloud desktop with security guardrails — meaning businesses can deploy it without giving AI direct access to their machines. Expect enterprise adoption to accelerate now that the research has proven superhuman capability.

The question is no longer whether AI can use a computer like a human. It's how fast the 72.6% climbs to 90%.

Related Content — Get Started with Easy Claude Code | Free Learning Guides | More AI News

Sources

Stay updated on AI news

Simple explanations of the latest AI developments