2026-04-25deepseek-v4huawei-ascendopen-source-aiai-inferencellmchina-aiself-hosted-ainvidia-alternative

DeepSeek V4 Runs on Huawei Chips — Fraction of R1's Cost

DeepSeek V4 preview runs on Huawei Ascend NPUs — not NVIDIA — slashing AI inference costs below R1. Open-weights model rivals GPT-4o. Free to self-host.

On April 24, 2026, DeepSeek released V4 in preview — and the headline is not the benchmark numbers. It is where V4 runs. For the first time, DeepSeek explicitly supports Huawei's Ascend NPUs (Neural Processing Units — specialized chips built specifically for AI calculations, distinct from the general-purpose GPUs most AI models require). Running costs drop to a fraction of what DeepSeek's previous flagship, R1, required. For any AI automation or engineering team paying monthly cloud bills to run AI, that is the number that matters.

DeepSeek V4 AI Inference Cost: From R1 to a Fraction

DeepSeek's R1 model — released in January 2025 — briefly forced the entire AI industry to recalibrate. An open-weights model (one whose trained parameters are publicly downloadable and deployable by anyone) had matched GPT-4-class performance at a fraction of proprietary costs. V4 pushes that cost advantage further still.

According to reporting by Tobias Mann at The Register, DeepSeek V4 "cuts inference costs to a fraction of R1's" — inference (the process of actually running the AI to generate responses, separate from the far more expensive step of training it) being the primary operational expense for production AI deployments. Context matters: R1 was already running at roughly 5–10x lower cost per token (a "token" is approximately one word or word-fragment processed by the AI) compared to equivalent-tier models from OpenAI and Anthropic. If V4 meaningfully undercuts that again, enterprise teams processing millions of queries per day are looking at a material shift in their monthly AI infrastructure spend.

V4 remains in preview as of April 25, 2026 — exact benchmark numbers and cost reduction percentages have not yet been published. But the direction is clear:

Lower inference cost is the primary differentiator, not just raw capability
Huawei Ascend NPU optimization is the mechanism enabling that cost reduction
Open-weights distribution means teams can self-host and eliminate per-query fees entirely

The Huawei Angle: China's Answer to the NVIDIA Embargo

DeepSeek V4 Huawei Ascend NPU optimization — AI inference cost reduction replacing NVIDIA H100

Since October 2022 — and tightened again in October 2023 — U.S. export controls have blocked NVIDIA from selling its most powerful AI accelerators (the A100, H100, and subsequent generations) to Chinese companies. The policy was designed to slow Chinese AI development by cutting off access to the compute hardware that trains and runs frontier models.

DeepSeek V4, optimized for Huawei's Ascend 910-series NPUs, is a direct technical response to that policy. The Huawei Ascend 910C — the current flagship in the series — delivers approximately 256–280 TFLOPS (tera floating-point operations per second, a standard measure of AI processing speed) of AI compute per chip. For comparison, NVIDIA's H100 delivers around 312 TFLOPS. The performance gap is real but narrowing. More importantly: the Ascend chip is manufactured domestically within China and is not subject to U.S. export restrictions.

By explicitly supporting Ascend hardware, DeepSeek signals three things simultaneously:

Validation of China's domestic chip strategy — frontier-class AI models now run on domestically manufactured hardware at competitive performance levels
Structural decoupling from U.S. semiconductor supply chains — a geopolitical signal as much as an engineering decision
A new deployment path for non-Western enterprises — Huawei Cloud regions operate across Asia, the Middle East, and Africa, where Ascend infrastructure is accessible regardless of chip export rules

This is the story that makes V4 more than another model release. It is evidence that China's domestic AI hardware strategy — built in direct response to U.S. export controls — is producing results capable enough to run competitive models at production scale. The "toaster framing" of DeepSeek (building efficient models that run on constrained hardware) has evolved from a curiosity into a strategic pattern.

Open-Weights AI: Self-Hosting DeepSeek V4 vs Proprietary Models

DeepSeek V4 open-source AI model on Hugging Face — self-hosted LLM for AI automation teams

DeepSeek V4 is an open-weights model — meaning the trained model weights (the billions of numerical parameters that encode the AI's knowledge and reasoning patterns) are publicly available for download and deployment. This is fundamentally different from GPT-4o or Claude Sonnet, which are proprietary black boxes accessible only through paid API connections (API — Application Programming Interface — the remote software bridge you pay per query to access).

In practical terms, open weights enables four things proprietary models do not:

No per-query fees: Run millions of queries on your own hardware with no incremental cost after initial setup
Data stays local: Regulated industries — healthcare, finance, legal — can self-host without routing sensitive data through a third-party server
Fine-tuning: Re-train the model on your own proprietary data to specialize it for specific tasks — your company's documentation, product catalog, or legal corpus
No vendor lock-in: You control the deployment, the version, the hardware, and the upgrade timeline

The constraint: V4 requires serious hardware to run at production speeds — multi-GPU server configurations or cloud instances with Huawei Ascend NPU access. But for companies spending $20,000–$50,000 per month on OpenAI or Anthropic API calls, the economics of self-hosting start to look compelling fast. See our AI automation guides for a practical framework on evaluating open vs. proprietary AI for your stack.

Getting Started with DeepSeek V4 Today

DeepSeek V4 is available via DeepSeek's official API in preview, with Hugging Face (the primary open-source AI model repository) hosting expected at full production release. Critically, DeepSeek's API deliberately mirrors the OpenAI SDK format — migration from an existing OpenAI integration requires a single line change:

from openai import OpenAI

# Switching from OpenAI to DeepSeek V4: one line change
client = OpenAI(
    api_key="YOUR_DEEPSEEK_API_KEY",
    base_url="https://api.deepseek.com"
)

response = client.chat.completions.create(
    model="deepseek-v4",
    messages=[
        {"role": "user", "content": "Summarize the key AI hardware trends in 2026."}
    ]
)

print(response.choices[0].message.content)

The base_url swap is intentional product design on DeepSeek's part — reducing migration friction is a competitive strategy. Teams already integrated with OpenAI's Python library can run a V4 test in under 5 minutes.

The Uncomfortable Question for Western Enterprises

DeepSeek V4's claims are pointed: performance rivaling the best American LLMs (Large Language Models — AI systems trained on massive text datasets to understand and generate human language), at dramatically lower running costs, on hardware that operates entirely outside the U.S. semiconductor supply chain. For most of the world, that is an attractive proposition on its face.

For Western enterprises — particularly those with U.S. government contract exposure — the calculus is more complicated:

Regulatory exposure: Several U.S. federal agencies already prohibit use of Chinese-origin AI tools. That list is likely to expand as adoption scales.
Training data provenance: Open-weights models do not provide full transparency into their training data. Auditing for data contamination (unintentional training on proprietary or restricted datasets) is technically difficult and time-consuming.
Supply chain risk: Production deployments on Huawei Ascend infrastructure create a dependency on Chinese hardware — a risk profile that security-conscious boards will reject regardless of cost savings.

For companies in Asia, the Middle East, Africa, or those without U.S. government contract exposure, those constraints largely do not apply. V4 represents a high-quality, dramatically cheaper, fully self-hostable alternative to paying OpenAI or Anthropic by the token — and it runs on hardware that does not require navigating U.S. export policy.

Full production release and independent benchmark results are expected within 4–6 weeks of the April 24 preview launch. If those results confirm the preview claims, DeepSeek V4 will stand as the most significant cost-reduction event in enterprise AI since R1 reset pricing expectations in early 2025. Watch the AI news feed for benchmark updates as they publish.

Related Content — Get Started | Guides | More News

Sources

The Register

Stay updated on AI news

Simple explanations of the latest AI developments