Claude Sonnet 4.6: 1M Token Context Window Now Standard
Claude Sonnet 4.6 now offers a 1M token context window at standard pricing. Sonnet 4 retires June 15, 2026 — migrate your API now or face errors.
Claude Sonnet 4.6 — Anthropic's biggest AI automation API upgrade in months — just landed, and the headline feature costs nothing extra. The 1M token context window (a measure of how much text Claude can process at once — roughly 750,000 words, or the entire Lord of the Rings trilogy three times over) is now available to all API users at standard pricing. No special beta access. No extra charge. Just an update to your model string.
But buried in the same release is a firm deadline: Claude Sonnet 4 retires June 15, 2026. For the teams that built production systems on Sonnet 4 last year, the clock is running. This is the most consequential Claude update of 2026 — a major capability upgrade and a mandatory migration, delivered in the same announcement.
From Beta to Everyone: What 1 Million Tokens Actually Means
Until now, accessing a 1M token context window required enrolling in a beta program and adding a special request header to every single API call. That friction is gone. Any team on a paid Claude API plan can send prompts up to 1 million tokens — the equivalent of roughly 750 pages of dense text — without any special setup or surcharge.
The previous standard limit was 200,000 tokens. The new limit is 5 times larger. In practical terms, this unlocks tasks that previously required complex workarounds:
- Read an entire software codebase in one API call instead of chopping it into pieces
- Analyze hundreds of legal contracts simultaneously without merge-and-split pipelines
- Process up to 600 images or PDF pages per request — up from the old cap of 100
- Handle full transcripts of multi-hour meetings without lossy chunking (splitting text and losing connections between sections)
The media limit jump — from 100 to 600 files per request — is especially significant for teams doing document intelligence (automated analysis of large document sets), compliance review, or multimodal (combined text + image) processing at scale. What previously required six separate API calls can now happen in one.
Claude Sonnet 4.6, Opus 4.6 & Haiku 4.5: Which Model Is Right for You
The update ships three distinct model tiers. Here's what each one does in plain language — no benchmarks, just use cases:
Claude Sonnet 4.6 — The New Default Workhorse
Sonnet 4.6 is where most teams should land after migration. It carries the full 1M token context window, excels at complex coding and multi-step agentic tasks (automated chains of actions where Claude plans, decides, and executes — not just answers questions), and is priced at standard rates, not premium. It's the direct, drop-in successor to the retiring Sonnet 4.
Claude Opus 4.6 — The Heavy Thinker, Now With Speed
Opus 4.6 targets the hardest problems: synthesizing hundreds of research papers, running long-horizon autonomous workflows (tasks that run for hours or days without human checkpoints), analyzing sprawling legal cases. The key addition is Fast Mode — up to 2.5× faster output token generation at premium pricing. Teams that previously ruled out Opus because of latency now have a viable interactive option. The Message Batches API (a feature for sending many requests efficiently in bulk) raises its max tokens cap to 300,000 for Opus 4.6 and Sonnet 4.6 — six times the previous Batches limit.
Claude Haiku 4.5 — Speed Without Sacrifice
Haiku 4.5 is the right choice for real-time, high-volume, cost-sensitive applications: customer support bots, inline autocomplete, live search suggestions. It doesn't carry the 1M context window, but it remains the fastest and most affordable tier, and Anthropic describes it as delivering "near-frontier performance" — meaning it punches well above its price point for most routine tasks.
Eight New Claude API Features Shipping Alongside the Models
The model lineup isn't the whole story. Eight additional capabilities landed in this release, several of which could meaningfully change how teams build Claude-powered products:
- Agent Skills (beta): Pre-built integrations for PowerPoint, Excel, Word, and PDF. Claude can now directly read and write Office files without you needing to set up custom parsing libraries or data pipelines.
- Claude Managed Agents (public beta): Secure, pre-configured agentic environments — think sandboxed containers (isolated computing spaces where Claude can run code without touching your actual systems) with built-in safety controls. Dramatically cuts the engineering overhead of shipping safe AI agents.
- Advisor Tool (public beta): Pairs a fast "executor" model with a smarter "advisor" model that provides mid-generation guidance. Designed for reasoning tasks too complex for a single model pass — the advisor catches errors before they compound.
- Automatic Caching: A single
cache_controlfield now activates prompt caching (reusing expensive computations when the same content appears across many requests). Automatic cost and latency savings for any system that sends the same system prompt on every API call. - Compaction API (beta): Server-side context summarization — Claude compresses old conversation history rather than simply dropping it when the context limit approaches. Enables conversations that never hit a hard ceiling.
- Structured Outputs (now GA): Claude's JSON schema mode — which forces responses into a specific machine-readable format your application can parse reliably — is now generally available. No more beta headers required to use it.
- Data Residency Controls: US-only inference is now available via the
inference_geoparameter at a 1.1× pricing premium. No capability difference — just a compliance option for regulated industries. - Web Fetch Tool (beta): Claude can retrieve and analyze content from live web URLs and PDF links during a conversation. No more manual copy-pasting of web content into your prompts.
The June 2026 Cliff: What Breaks If You Don't Act
Claude Sonnet 4 — technically claude-sonnet-4-20250514 — retires on June 15, 2026. That's 13 months after its May 2025 launch. Any application calling the old model ID after that date will receive API errors and stop working entirely. No graceful fallback, no extension — just failures.
The timeline is tighter for some features. The 1M token beta on Sonnet 4.5 and Sonnet 4 ends April 30, 2026 — meaning requests exceeding 200k tokens on those older models will begin erroring in under three weeks. And Claude Haiku 3 reaches its sunset on April 19, 2026 — four days from today.
For teams on Claude Sonnet 4, the minimum change is a model string update. Here's what that looks like in Python:
from anthropic import Anthropic
client = Anthropic()
# BEFORE — retires June 15, 2026 (API errors after that date)
# model="claude-sonnet-4-20250514"
# AFTER — Sonnet 4.6 with 1M context at standard pricing
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=300000, # raised cap on Message Batches API
messages=[
{"role": "user", "content": "Your prompt here"}
]
)
print(response.content[0].text)
If your integration uses beta request headers for structured outputs or context caching, review those carefully. Several beta features have graduated to GA — meaning the headers are now redundant and may produce unexpected behavior if left in place. The official release notes list every change in the Anthropic API documentation.
A 13-Month Model Lifespan: The New Normal for Enterprise AI Automation
Step back from the features for a moment. Sonnet 4 launched May 2025. It retires June 2026. That's a 13-month production lifespan for a model that enterprises embedded into customer-facing products, internal tools, and automated pipelines. For comparison, AWS typically provides 12–24 months of deprecation notice for services — with optional extended support paths. Anthropic's current cadence offers no extension.
The trend shows no sign of slowing: Haiku 3 lasted roughly 14 months; Claude 3.7 Sonnet lasted under a year in some configurations. If you're building on the Claude API at any meaningful scale, treating model version strings as long-lived stable dependencies is a reliability risk. The practical fix is straightforward: abstract your model string into a configuration variable or environment variable today. When the next deprecation notice arrives, your response becomes a one-line config change instead of an unplanned engineering sprint.
You can monitor the full retirement schedule and upcoming feature sunset dates at the official Claude release notes page. If you're newer to AI automation and want to understand what "context windows" and "tokens" actually mean for your day-to-day work without the engineering jargon, start with the beginner guides at AI for Automation — they explain the concepts in plain English.
Related Content — Get Started with Claude API | AI Automation Guides | More AI News
Sources
Stay updated on AI news
Simple explanations of the latest AI developments