Claude Opus 4.6: 1M Context Window Free & 2.5x Faster
Anthropic's Claude API just went GA with 1M-token context on Opus 4.6 — no beta required. 2.5x fast mode, managed agents, and 30+ updates are live now.
Anthropic just pushed over 30 updates to the Claude API — and two of them change what every developer building with Claude can do today: the 1-million-token context window (the ability to process roughly 750,000 words in a single conversation) is now generally available at standard pricing, and a new fast mode for Claude Opus 4.6 cuts response times by up to 2.5×. No beta headers. No special access flags. Just available — right now.
If you're building anything that reads long documents, analyzes large codebases, or runs multi-step automated workflows, these changes alone reshape what's practical to ship today.
The 1-Million-Token Context Window — No Longer Experimental
Until this release, Anthropic's 1-million-token context window required a special beta header in every API request, signaling it wasn't production-ready. That restriction is now gone.
A token is roughly ¾ of a word in English — so 1 million tokens translates to approximately 750,000 words, or about 10 full-length novels. For context: the entire U.S. Tax Code runs roughly 4 million words. Claude can now process a meaningful chunk of it in one shot.
What changed with GA (generally available — meaning officially stable for production use):
- No beta header required — any standard API call works immediately
- Media limit raised to 600 images or PDF pages per request, up from 100 — a 6× increase
- Long context pricing still applies past 200,000 input tokens, so factor that into cost projections
- Available on Claude Opus 4.6 and Sonnet 4.6
In practice: you can now hand Claude an entire legal contract with every exhibit, a full software repository, or a year of meeting transcripts — and it will reason across everything in one pass without any workarounds.
Fast Mode: 2.5× Speed for Opus 4.6 — the Full Picture
The new fast mode for Claude Opus 4.6 delivers up to 2.5× faster output token generation (word-by-word text streaming speed). For real-time AI automation applications — live coding assistants like Claude Code, customer-facing chatbots, interactive document tools — that's a perceptible difference in responsiveness.
The trade-offs worth knowing: fast mode runs at premium pricing (the exact multiplier hasn't been published), and it's currently in research preview, meaning you need to apply for access rather than flipping a switch today.
Opus 4.6 also introduces adaptive thinking — a mode where Claude dynamically decides how much of its reasoning process to show based on task complexity. Developers can now suppress visible reasoning with a new thinking.display: "omitted" setting, letting Claude use its reasoning internally without showing it to end users — with zero impact on billing or conversation continuity.
One hard constraint to know before upgrading: Claude Opus 4.6 does not support prefilling assistant messages — a technique some developers use to steer outputs at the start of a turn. If your workflow depends on that, keep an earlier model in that slot for now.
Which Model Fits Which Job
- Claude Opus 4.6 — "most intelligent model" for complex agentic tasks (multi-step workflows where AI takes actions); 1M context now GA; fast mode in research preview
- Claude Sonnet 4.5 — "best model for complex agents and coding"; recommended for most production deployments
- Claude Haiku 4.5 — "near-frontier performance" (capability close to top-tier models) at lowest cost and fastest latency
Claude Managed Agents: Anthropic Takes Over the Hard Part
The biggest architectural change in this batch is Claude Managed Agents, now in public beta. Traditionally, developers had to build scaffolding — the surrounding code that lets Claude retry failed steps, manage its context window, and call tools in the right order. Managed Agents moves that entirely to Anthropic's infrastructure.
Plain terms: you define a goal; Claude executes it inside a secure sandbox (an isolated container where code runs without touching your production systems), with built-in tools and live progress streaming included.
# 1M context window — no beta header needed anymore
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-6-20251101",
max_tokens=8096,
messages=[{
"role": "user",
"content": "Analyze this 500-page contract and flag every liability clause."
# No special header — 1M context is now GA
}]
)
print(response.content[0].text)
Agent Skills (also in beta) add pre-built capabilities for Microsoft Office files — PowerPoint, Excel, Word, and PDF — with Anthropic handling the parsing complexity. Custom Skills can be defined via API too. For teams automating document workflows, this eliminates a significant chunk of boilerplate.
Infinite Conversations, Automatic Caching, and a Memory Tool
Two longstanding pain points in AI development — conversations that hit a hard wall, and the cost of re-sending the same context repeatedly — are both addressed in this release:
- Compaction API (beta): Server-side context summarization (automatically condensing old conversation history before the window fills). Anthropic's claim: "effectively infinite conversations" — no artificial stopping points in long-running workflows.
- Memory tool (beta): Stores and retrieves specific information across separate conversations. Claude can now "know" your project's history in a future session, not just the current one.
- Automatic caching: A single
cache_controlfield now handles prompt caching (storing repeated prompts so they load faster and cost less on repeat calls). Cache lifetimes: 5 minutes or 1 hour. No manual breakpoints required.
Additional improvements worth flagging:
- 300,000 maximum output tokens on Message Batches API (the tool for running thousands of AI requests in parallel) for Opus 4.6 and Sonnet 4.6
- Web fetch tool (beta): Claude can now retrieve full content from live web pages and PDFs mid-conversation
- Structured outputs (now generally available): Guaranteed schema-formatted responses (JSON with a guaranteed shape) on Sonnet 4.5, Opus 4.5, and Haiku 4.5
- C# and PHP SDKs now in beta — first official Anthropic libraries for .NET and PHP ecosystems
- Data residency controls: A new
inference_geoparameter routes all processing to US-only servers for compliance-sensitive deployments, at 1.1× standard pricing
Cloud platform coverage expanded significantly: Claude now runs on Microsoft Azure (Foundry) with full extended thinking and prompt caching support, and on Google Vertex AI with the 1M token context window. For teams building automated workflows across cloud platforms, the practical guides on AI automation cover integration patterns for each environment.
Mark Your Calendar: Claude Haiku 3 Retires in 9 Days
Anthropic's model deprecation cycle is accelerating. If any of these model names appear in your code — environment variables, config files, API calls — you are on a live countdown:
- 🔴 Claude Haiku 3 — retiring April 19, 2026 — that is 9 days from today
- 🔴 1M context beta endpoint (Sonnet 4.5 and 4) — retiring April 30, 2026
- ⚫ Claude Sonnet 3.7, Haiku 3.5, Opus 3 — already retired; calls return errors now
The migration path: Claude Haiku 4.5 is the direct replacement for Haiku 3, delivering "near-frontier performance" at comparable speed and cost. For the 1M context window, switching from the beta to the GA endpoint removes the dependency entirely.
Pull back and this release tells a clear story. In roughly six months, Anthropic shipped 3 major model releases, a complete managed agent infrastructure, cloud integrations across AWS, Azure, and Google Cloud, 2 new SDK languages, and enterprise compliance tooling. This is not a research organization tossing model checkpoints over a wall — it is a platform company racing to capture enterprise AI infrastructure budgets before OpenAI and Google lock them in. The 2.5× fast mode, the infinite-conversation Compaction API, and the managed agent harness all point at one goal: making Claude the reliable, debuggable, cost-predictable option for teams that need AI to work in production, not just demos.
Explore all the new features at platform.claude.com/docs. If Claude Haiku 3 is anywhere in your stack, use the Claude API setup and migration guide to move today — 9 days goes faster than it looks.
Related Content — Get Started | Guides | More News
Sources
Stay updated on AI news
Simple explanations of the latest AI developments