Google Gemini Agents Hit 96% Success Rate With Live Docs
Gemini AI agents hit 96.6% success with live docs — up from 28.2%. Google also shipped ADK Java 1.0, FunctionGemma offline AI, and 6 new protocols.
Google just did something remarkable: by giving its Gemini AI access to live documentation — real-time instructions the model can look up while it works — the success rate on developer tasks jumped from 28.2% to 96.6%. Same model. Same task. Just better information access, delivering a 244% improvement. That single finding, buried in a wave of 10+ developer releases this week, reshapes how we think about why AI agents fail in the first place.
The implication is blunt: most AI agent failures today are an architecture problem, not an intelligence problem. Here's what Google shipped — and why it should change how everyone builds.
The 96.6% Breakthrough: What Actually Changed
AI agents (software programs that use AI to take actions on your behalf, like booking meetings or writing code) have a well-known weakness: they confidently use outdated information. Gemini is trained on data with a knowledge cutoff, so when developers ask it to generate code using the latest Google SDK (Software Development Kit — a pre-packaged set of tools for building apps), it often gives wrong answers based on older library versions.
Google's fix was elegant: they built a live "agent skill" — a lookup tool the model uses mid-task to retrieve current documentation directly from the web. The results were unambiguous:
- Gemini without the skill: 28.2% success rate on developer coding tasks
- Gemini with live docs skill: 96.6% success rate — same model, same prompts
- Net gain: +68.4 percentage points, roughly 3.4× better performance
Google's blog stated it plainly: "Evaluation results show a massive performance boost, with the gemini-3.1-pro-preview model jumping from a 28.2% to a 96.6% success rate when equipped with the skill." The lesson: before you blame the model, check what information it has access to. This is now an empirically proven principle, not a hunch.
ADK for Java 1.0: Enterprise AI Gets a Production-Grade Framework
Google released version 1.0.0 of its Agent Development Kit (ADK) for Java — a framework (a pre-built structure developers use to speed up application development) that brings AI agent capabilities to the language powering most of the world's banking, healthcare, and logistics software. While Python dominates AI research labs, enterprise systems overwhelmingly run on Java. This release targets that gap directly.
ADK for Java 1.0.0 ships with:
- Google Maps grounding — agents can query real-world locations and routes mid-task
- URL fetching — agents can retrieve and read live web pages during execution
- A2A protocol support — standardized Agent-to-Agent communication (enabling different AI tools to collaborate the way email lets different apps exchange messages)
- Third-party integrations — GitHub, Notion, Hugging Face, and other platforms built in out of the box
<!-- Add ADK for Java 1.0.0 via Maven -->
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>agent-development-kit</artifactId>
<version>1.0.0</version>
</dependency>
If your team maintains Java services and has been watching AI agent tooling from the sidelines, this is the entry point. Check out our integration guides to evaluate whether an agent framework belongs in your stack.
Six Protocols That Could End the "AI Tower of Babel"
Right now, building AI automation agents that span multiple platforms requires custom glue code for every connection point. Every service, every model, every API (Application Programming Interface — the connection mechanism that lets different software systems exchange data) has its own bespoke integration requirements. Google introduced six standardized protocols this week to eliminate that overhead:
- MCP (Model Context Protocol) — connects AI models to external data sources like databases or live documents
- A2A (Agent-to-Agent) — enables AI agents from different vendors to coordinate on shared tasks
- UCP (Universal Control Protocol) — standardizes how agents receive and interpret commands
- AP2 — manages agent permissions and access control across systems
- A2UI — allows AI agents to generate interactive dashboards directly for end users
- AG-UI — supports seamless real-time streaming output interfaces
Think of it like the invention of USB. Before USB, every device had a different plug. After USB, one connector worked everywhere. These six protocols aim to do the same for AI — and the practical payoff is direct: teams adopting them stop writing custom integration code and start shipping faster.
FunctionGemma: 270 Million Parameters, No Wi-Fi Required
FunctionGemma is a 270-million parameter (a measure of model size and capability — a larger parameter count generally means broader reasoning, but at higher compute cost) on-device model that operates entirely without an internet connection. Google designed it for environments where cloud connectivity is unavailable, expensive, or prohibited by regulation.
FunctionGemma handles three categories of tasks entirely on-device:
- Calendar management — scheduling and reminder logic processed on the device chip, no server needed
- Hardware control — adjusting settings or triggering connected hardware without a round-trip to the cloud
- Function calling (directing an AI to execute specific pre-programmed actions in a defined sequence) — fully local, no data leaves the device
The tradeoff is real: at 270M parameters, FunctionGemma handles structured, well-defined tasks reliably but lacks the complex multi-step reasoning of cloud-scale models. For regulated industries — healthcare, finance, defense — where data privacy laws often prohibit sending information to external servers, this is an acceptable tradeoff for compliance. A 270M-parameter offline model that works is worth far more than a trillion-parameter cloud model you legally cannot use.
Three New Gemini Code Assist Features Worth Testing This Week
All three shipped to VS Code and IntelliJ IDEs (Integrated Development Environments — the applications developers use to write, test, and debug code) this week. None require configuration changes if you already have Gemini Code Assist installed:
Finish Changes: AI That Completes Your Pattern
Gemini Code Assist now monitors your edits in real time. Start a refactor (restructuring code to improve its organization without changing what it does), write the first two changed lines, and it predicts and completes the same pattern across your entire file. It also converts pseudocode (informal plain-English descriptions of intended behavior) into working code and applies consistent refactoring patterns automatically. For large-scale cleanups, this alone can compress hours of mechanical work into minutes.
Outlines: Auto-Generated Chapter Titles for Your Codebase
Outlines generates interactive English summaries interleaved throughout your source code files — acting as auto-generated navigation labels for large codebases. Click any summary to jump directly to that section. Particularly useful for legacy systems where understanding what a function does currently requires reading through hundreds of lines of unfamiliar code.
Conductor: Code Review That Runs Before the Human Does
Conductor sits between AI code generation and human review. It validates AI output against the original implementation plan, enforces style guides (naming conventions and formatting rules your team follows), and flags security risks — automatically, before any human reviewer opens the pull request (a request to merge code changes into the main codebase). For teams where 30–50% of code now originates from AI tools, this layer catches the subtle errors and policy violations that developers miss when reviewing AI output at volume.
The Race for the AI Developer Infrastructure Layer
Ten-plus developer tools shipped in a single coordinated wave — unusual even by Google's aggressive release cadence. The pattern is clear: Google is no longer competing purely on model capability. It's racing to own the infrastructure layer — the frameworks, protocols, and tooling that developers build everything else on top of. Whoever controls that layer captures compounding platform value for years, regardless of which underlying model wins.
OpenAI won early developer mindshare through ChatGPT's simplicity and a frictionless API. Google's counter is systematic: standardize the protocols (6 new ones this week), target enterprise-grade Java developers, make on-device AI practical for regulated industries, and solve the core agent failure mode that the 28.2% → 96.6% jump exposes. The message from this week's releases is direct: AI agents that fail today are mostly fixable with better information architecture — not bigger models, not more training data.
You can start today: ADK for Java 1.0.0 is available via the Maven repository, Gemini CLI Plan Mode activates with gemini --mode plan if you have the CLI installed, and Finish Changes plus Outlines are live in VS Code now. If your team runs any TFLite-based Android features, start the LiteRT migration assessment immediately — the longer you wait, the harder the migration becomes. And if you're building any AI agent that depends on current, accurate information, the live-docs skill pattern Google just validated at 96.6% is worth implementing right away. Explore our setup resources to get started.
Related Content — Get Started | Guides | More News
Sources
- Google Developers Blog: ADK for Java 1.0.0
- Google Developers Blog: Closing the Knowledge Gap with Agent Skills
- Google Developers Blog: Developer's Guide to AI Agent Protocols
- Google Developers Blog: On-Device Function Calling with FunctionGemma
- Google Developers Blog: Finish Changes and Outlines in Gemini Code Assist
Stay updated on AI news
Simple explanations of the latest AI developments