DeepSeek V4: 1T params, 30x cheaper than GPT-5.4
DeepSeek V4 previewed with 1 trillion parameters, 1M token context, and projected $0.10–0.30/M pricing — 30–50x cheaper than GPT-5.4. Apache 2.0 open-source.
China's DeepSeek soft-launched DeepSeek V4 on March 9, 2026 — a model with 1 trillion total parameters that is projected to cost $0.10–$0.30 per million input tokens. That's 30 to 50 times cheaper than OpenAI's GPT-5.4 ($10/M) and 50 times cheaper than Claude Opus 4.6 ($15/M), while targeting comparable — and in some benchmarks superior — output quality. It's currently available in a "V4 Lite" preview on DeepSeek's website, with a full public release and official benchmark disclosures still pending.
The full model is expected to launch under the Apache 2.0 open-source license — meaning anyone can download the weights, fine-tune it, and run it locally or commercially for free. If the leaked benchmarks hold up at full release, this is the most disruptive cost-performance announcement in AI since DeepSeek V3 arrived in early 2025.
One Trillion Parameters — but Only 37 Billion Working at a Time
The headline "1 trillion parameters" sounds intimidating, but the way DeepSeek V4 works is more nuanced — and more efficient. It uses a Mixture of Experts (MoE) architecture (think of it like a panel of specialists: instead of consulting all 1,000 experts every time, you route each question to just the 37 most relevant ones). At any given moment, only 37 billion parameters activate per token (each word or word-fragment processed). The practical result: inference costs the same as running a 37B dense model, not a 1-trillion-parameter one.
For comparison, DeepSeek V3 had 671 billion total parameters. V4 is 49% larger in total scale but routes computation far more efficiently — running on similar hardware at similar speed.
Context Window: 8x Larger Than V3, With 97% Recall
DeepSeek V4's context window (the maximum amount of text it can read and process in one go — like working memory) is 1 million tokens, versus DeepSeek V3's 128K. That's roughly 750,000 words — enough to hold an entire legal case file, a year of email threads, or a full software codebase in a single prompt.
What makes this more than a spec sheet number: internal benchmarks show 97% recall accuracy at full 1M context using a new technique called Engram Conditional Memory (a retrieval method that compresses older context without losing critical information — similar to how human long-term memory stores summaries rather than verbatim records). This addresses the well-known problem where most AI models become inaccurate or confused when given very long documents.
Long-context processing also costs dramatically less thanks to Dynamic Sparse Attention (DSA) — an architectural improvement that reduces processing complexity from quadratic to linear (meaning doubling the context length doesn't quadruple the compute cost, it roughly doubles it).
Natively Multimodal — Text, Image, Video, Audio From the Ground Up
Unlike most multimodal AI models (where image or audio capabilities are add-ons bolted onto a text-focused base), DeepSeek V4 was trained from the start on text, image, video, and audio together. This matters because multimodal understanding tends to be stronger and more consistent when modalities are learned jointly rather than patched in later.
What this looks like in practice: you can ask V4 to analyze a video clip, describe what's happening in an image, transcribe and summarize audio, or mix all four in a single prompt — without switching between specialized models.
Built Without Nvidia or AMD — A Geopolitical Hardware Statement
DeepSeek trained V4 entirely on Huawei Ascend 910B and Cambricon MLU chips — Chinese-made AI accelerators. This is significant beyond the technical: US export controls have blocked Nvidia's top-tier AI chips from reaching Chinese companies since 2022, and DeepSeek's ability to train a 1-trillion-parameter model without them demonstrates that the Chinese AI industry is no longer hardware-dependent on Western supply chains.
DeepSeek V4 vs the Competition — Cost Comparison
- DeepSeek V4 (projected): $0.10–$0.30/M input tokens
- GPT-5.4 Standard: $10/M input tokens — 33–100x more expensive
- Claude Opus 4.6: $15/M input tokens — 50x more expensive
- DeepSeek V3 (current): ~$0.27/M input — comparable pricing baseline
Note: Projected pricing based on V3 cost structure and analyst estimates. Official V4 pricing will be confirmed at full release.
What the Leaked Benchmarks Show (Unverified)
Official benchmark results haven't been published yet, but leaked internal test data suggests:
- HumanEval (coding): ~90% — vs GPT-4o at ~88.5%
- SWE-bench Verified (real-world bug fixing): 49.2% — vs GPT-4o at 38.8%
These numbers are unverified until DeepSeek publishes its official technical report. Treat them as directional, not definitive. Full benchmark disclosures are expected alongside the official public release.
Can You Run It Locally?
Minimum hardware requirements for local self-hosting (once weights are released):
- INT8 precision (high quality): 2× RTX 4090 GPUs (48GB VRAM total)
- INT4 precision (compressed, slightly lower quality): 1× RTX 5090 (32GB VRAM)
These are significantly more accessible than running GPT-5.4-class models locally, thanks to the MoE architecture keeping active memory requirements low.
How to Access DeepSeek V4 Right Now
The V4 Lite preview (200B parameter variant) is live at chat.deepseek.com. The full 1T model and official API access are expected at the full public release. Once available, you can access it via the DeepSeek API using an OpenAI-compatible format:
pip install openai
from openai import OpenAI
# DeepSeek uses an OpenAI-compatible API
client = OpenAI(
api_key="YOUR_DEEPSEEK_API_KEY",
base_url="https://api.deepseek.com"
)
response = client.chat.completions.create(
model="deepseek-v4",
messages=[{"role": "user", "content": "Analyze this contract clause..."}]
)
print(response.choices[0].message.content)
Get an API key at platform.deepseek.com when the full release lands.
Related Content — Get Started with Easy Claude Code | Free Learning Guides | More AI News
Sources
Stay updated on AI news
Simple explanations of the latest AI developments