AI for Automation
Back to AI News
2026-03-23XiaomiMiMoAI agentmultimodal AItext-to-speechautonomous browsing

Xiaomi's AI just bought something online — it even haggled

Xiaomi's MiMo V2-Omni can browse the web, compare prices across sites, negotiate with sellers via chat, and complete purchases — all without any human help.


Xiaomi just launched three new AI models — and the wildest one doesn't just answer questions. MiMo V2-Omni can see your screen, hear what's happening around it, browse the internet, and take action on your behalf. In a live demo, it opened a browser, searched for a product on one platform, compared prices on another, negotiated a discount with a seller via chat, and completed the purchase. No human touched the keyboard.

Xiaomi MiMo V2 Omni multimodal AI launch

An AI with eyes, ears, and hands

Most AI chatbots just read and write text. MiMo V2-Omni is different — it processes images, video, and audio simultaneously through a single system, and it can interact with software the way a human does: clicking buttons, scrolling pages, and typing into forms.

In Xiaomi's demos, Omni performed tasks that normally eat up your afternoon:

🛒 Online shopping: Browsed products on Xiaohongshu (China's Instagram), jumped to JD.com to compare prices, opened a chat with the seller, negotiated a discount, and placed the order — all autonomously.

🚗 Dashcam analysis: Watched live dashcam footage, identified pedestrians, cyclists, and vehicles, and flagged potential hazards in real time.

📱 Content creation: Created a multimedia post, debugged code, and published it to TikTok — without human intervention.

The audio processing is equally impressive: Omni can listen continuously for over 10 hours, making it potentially useful for meeting recording, real-time translation, or monitoring.

How it stacks up against the competition

On web navigation benchmarks (tests that measure how well AI can browse and interact with websites), Omni outperformed both Google's Gemini 3 Pro and OpenAI's GPT-5.2. On image understanding tests, it scored 76.8 — beating Claude Opus 4.6's 73.9.

MiMo V2-Omni benchmark results showing it outperforms Gemini and GPT on web navigation

That said, it's still behind the top coding-focused models. On ClawEval (a coding benchmark), Omni scored 54.8 — well behind Claude Opus 4.6's 66.3. This isn't a replacement for a coding assistant; it's built for real-world tasks that involve seeing, hearing, and interacting with software.

The AI voice that actually sings

The third model in Xiaomi's launch, MiMo V2-TTS, is the only commercial voice AI that can both speak and sing natively. Trained on over 100 million hours of speech data, it doesn't just read text aloud — it interprets emotional cues.

Tell it to sound "sleepy, just woken up, slightly hoarse" and it will. Tell it to sound "angry, but trying to stay calm" and it adjusts. It even generates natural sounds like coughs, hesitations, sighs, and laughter — without being explicitly told to. Write a word in ALL CAPS? It emphasizes it. Repeat a letterrrrr? It stretches the sound.

Why this matters for creators: If you make podcasts, audiobooks, voiceovers, or any audio content, a TTS (text-to-speech) model that understands emotion and can sing could dramatically cut production time. No more robotic AI voices reading your script in a monotone.

Pricing that undercuts everyone

Xiaomi also launched MiMo V2-Pro, a trillion-parameter language model (the AI's total brain size, with 42 billion neurons active per request). Its pricing tells the real story:

MiMo V2-Pro: $1 per million input tokens / $3 per million output tokens

Claude Opus 4.6: $5 / $25

Claude Sonnet 4.6: $3 / $15

That makes MiMo V2-Pro 5–8x cheaper than the leading Western models, while scoring competitively on coding and reasoning benchmarks. Before its official launch, it ran anonymously on OpenRouter under the codename "Hunter Alpha" — and topped the daily rankings for several days. Many users assumed it was DeepSeek V4.

Who should pay attention

If you run an e-commerce business or do a lot of online research: Omni's autonomous browsing could eventually handle product comparison, competitive analysis, and purchasing workflows.

If you create audio content: V2-TTS's emotional range and singing capability could replace expensive voice actors for certain projects.

If you're a developer using AI APIs: V2-Pro's pricing makes it worth testing as a cheaper alternative to Claude or GPT for coding tasks. It's available now through partners like OpenRouter, OpenClaw, and OpenCode — with free access for one week at launch.

Xiaomi's team summed up their vision: "A model that only reads text lives in a library. A model that sees, hears, reasons, and acts lives in the world."

MiMo V2-Pro benchmark comparison chart showing competitive performance against Claude and GPT

Related ContentGet Started with Easy Claude Code | Free Learning Guides | More AI News

Stay updated on AI news

Simple explanations of the latest AI developments