MolmoWeb actually browses the web for you — and it's free
AI2's open-source web agent reads screenshots like a human, clicks buttons, fills forms, and books flights. The 8B model outperforms GPT-4o on navigation tasks.
Imagine telling your computer "Book me a cabin in Tahoe for next weekend" and watching it actually do it — opening Airbnb, picking dates, filtering results, and completing the reservation. That's what MolmoWeb does, and the entire system is free and open source.
Built by the Allen Institute for AI (Ai2), a nonprofit research lab in Seattle, MolmoWeb is an AI agent that controls your web browser by looking at screenshots — the same way you do. It doesn't read the website's underlying code. It literally sees what's on screen, decides what to click, and acts.
Small model, big results
The numbers are striking. MolmoWeb comes in two sizes — 4 billion and 8 billion parameters — yet the 8B version outperforms AI agents built on models many times its size:
WebVoyager benchmark (navigation across 15 popular sites):
• MolmoWeb-8B: 78.2%
• OpenAI o3: 79.3%
• GPT-4o-based agents with structured data access: lower
• Previous best open-source (Fara-7B): beaten on all 4 benchmarks
Pass@4 (success rate when given 4 attempts): 94.7%
In plain English: give MolmoWeb 4 tries at a web task, and it succeeds nearly 95% of the time. It even beats Claude 3.7 on UI element detection (finding and clicking the right button on screen).
Why screenshots instead of code?
Most web automation tools work by reading a website's HTML code — the invisible blueprint behind every page. MolmoWeb takes a completely different approach: it takes a screenshot, understands what it sees, and acts.
This matters for three reasons:
• It doesn't break when websites update. Code changes constantly, but the visual layout stays similar.
• It doesn't trigger anti-bot detection. Since it's not parsing code, websites can't easily distinguish it from a human.
• Screenshots use fewer computing resources than processing thousands of lines of HTML.
Trained on real human browsing
Ai2 released MolmoWebMix, the largest publicly available collection of human web browsing data ever assembled:
• 36,000 complete human browsing sessions across 1,100+ websites
• 2.2 million screenshot-question-answer pairs
• 7+ million UI element examples (buttons, links, fields labeled with precise coordinates)
One surprising finding: synthetic training data actually outperformed human demonstrations on identical tasks. The team used a three-AI system — one to plan, one to execute, and one to verify — to generate browsing runs that were more consistent than human volunteers.
What it can actually do
In demos, MolmoWeb successfully:
• Searched Wikipedia and extracted specific information
• Browsed TechCrunch articles by topic
• Found vacation rentals on Airbnb with specific dates and guest counts
• Switched between browser tabs and navigated complex multi-step flows
Current limitations: It won't log into accounts or handle payments (by design — safety guardrails). It also struggles with very small text in screenshots and vague instructions like "find something interesting."
Try the live demo
You can test MolmoWeb right now — no installation needed:
Live demo: molmoweb.allen.ai — type a task and watch it browse
Models on Hugging Face: allenai/MolmoWeb-8B and MolmoWeb-4B
Source code: github.com/allenai/molmoweb (Apache 2.0 license)
For local installation:
git clone git@github.com:allenai/molmoweb.git
cd molmoweb
uv venv && uv sync
uv run playwright install --with-deps chromium
bash scripts/download_weights.sh
bash scripts/start_server.sh ./checkpoints/MolmoWeb-8B
Why this matters beyond tech
Most powerful AI agents are locked behind corporate walls — OpenAI's Operator, Google's Project Mariner, Anthropic's Computer Use. MolmoWeb is the first open-source alternative that actually matches their performance.
That means researchers, startups, and anyone with a computer can build web automation without paying API fees or trusting a corporation with their browsing data. Every line of code, every training example, and every model weight is publicly available under Apache 2.0.
Related Content — Get Started with Easy Claude Code | Free Learning Guides | More AI News
Stay updated on AI news
Simple explanations of the latest AI developments