Control Any Website With Just Words — Alibaba's Page Agent Automates Clicks, Inputs, and Searches Without Code
Add one line of code to any webpage and AI handles button clicks and form fills for you. Alibaba's Page Agent works with text alone — no screenshots needed — making it fast and lightweight. Now at 10,700 GitHub stars.
"Click the login button," "Type 'today's weather' in the search bar," "Add the third product to my cart" — say something like this, and AI actually performs that action on a live webpage. The open-source project Page Agent, built by Alibaba, is exactly that tool. It gained 7,000 GitHub stars in just one week, pushing its total past 10,700.
• An AI automation tool that runs directly inside web pages — no separate app or browser extension required
• Instead of taking screenshots, it reads the text structure of the page to act — making it fast and lightweight
• Control websites with plain-language commands like "click the button" or "fill out the form"
• Free and open-source under the MIT license, with 749 contributor commits
How Is This Different From Existing Web Automation Tools?
Tools for automating websites have been around for a while. Options like Selenium and Playwright required developers to write code manually. More recent AI-powered tools (such as Browser Use) work by taking screenshots and having AI analyze the images, which is slow and requires advanced vision-capable AI models (multimodal models).
Page Agent takes a different approach. It doesn't take screenshots. Instead, it reads the webpage's HTML structure (the DOM, or Document Object Model — the underlying code that defines a webpage's layout) as text and sends that text to an AI model. The AI then reasons, "The login button is right here, so I should click this," and executes the action.
This approach has three key advantages:
- It's fast — there's no time spent capturing and analyzing screenshots
- It's lightweight — no image-recognition AI (multimodal model) is needed, which cuts costs
- It's accurate — since it's text-based, it pinpoints button locations precisely
How to Use It
The simplest way is to add a single line of code to any webpage:
<script src="https://cdn.jsdelivr.net/npm/page-agent@1.5.9/dist/iife/page-agent.demo.js"></script>
This instantly adds AI automation capabilities to the webpage. For more advanced usage, you can install it via npm:
# Install via npm
npm install page-agent
# Use in your code
import { PageAgent } from 'page-agent'
const agent = new PageAgent({
model: 'qwen3.5-plus', // AI model to use
apiKey: 'YOUR_API_KEY', // AI service API key
language: 'en-US' // Language support
})
// Give commands in natural language
await agent.execute('Click the login button')
await agent.execute('Type today's weather in the search bar and hit search')
A Chrome extension is also available, which lets you automate complex tasks across multiple tabs.
Real-World Use Cases
Workplace Automation
Automate the repetitive web tasks you do every day. Data entry in ERP (Enterprise Resource Planning) systems, customer lookups in CRM (Customer Relationship Management) platforms, processing approvals in internal tools — it's especially effective for filling out complex forms.
Embedding an AI Assistant in Your Product
If you're a SaaS (Software as a Service) developer, you can embed an AI helper directly into your product. When a user says "I don't know how to use this," the AI can guide them by pointing to buttons and walking them through steps. It runs entirely on the frontend (the user-facing interface) — no separate backend (server) development required.
Accessibility
It enables people with visual impairments to navigate websites using voice commands. Commands like "go to the next article" or "open the menu" are executed by Page Agent.
Limitations and Security
Page Agent only works inside the browser. It cannot perform OS-level actions like opening files or launching other programs. It also requires an AI model API key, so when using it on public-facing websites, be careful not to expose your API key.
It's compatible with OpenAI, Anthropic, and other AI services beyond Alibaba's Qwen models, so you can use whichever AI provider you already have set up.
The project is available under the MIT license on GitHub, and you can try it out firsthand on the online demo. It's written in TypeScript, which makes up 81.3% of the codebase.
Related Content — Get Started with AI the Easy Way | Free Learning Guide | More AI News
Stay updated on AI news
Simple explanations of the latest AI developments