2026-03-29AINvidiaVera RubinGPUdata centerhardware

Nvidia just dropped Vera Rubin — 10× cheaper AI, 2,300W

Nvidia Vera Rubin delivers 10× cheaper AI inference than Blackwell — but each GPU burns 2,300W and demands 100% liquid cooling.

Nvidia’s CEO Jensen Huang just unveiled Vera Rubin — the company’s most ambitious AI platform ever. Named after the astronomer who discovered evidence of dark matter (invisible forces holding galaxies together), this 7-chip system promises to cut AI inference costs to 1/10th of current prices. The catch? Each GPU burns 2,300 watts and demands 100% liquid cooling. Every major AI lab and cloud provider signed on simultaneously.

Nvidia Vera Rubin platform revealed at presentation

One Rack, 72 GPUs, 3.6 ExaFLOPS

The headline numbers are staggering. A single NVL72 rack (a server cabinet packed with 72 Rubin GPUs and 36 Vera CPUs) delivers 3.6 exaFLOPS of inference performance — that’s 3.6 quintillion calculations per second using NVFP4 precision (a number format optimized for AI workloads that trades tiny accuracy losses for massive speed gains).

Each individual Rubin GPU hits 50 PFLOPS (petaFLOPS, or 50 quadrillion operations per second) for inference — a 5× jump over Nvidia’s current Blackwell B200 chip. Training performance reaches 35 PFLOPS per GPU, a 3.5× improvement. For context, mixture-of-experts models (a popular AI architecture where only parts of the model activate for each query, saving compute) now need just 1/4 the GPUs compared to Blackwell.

Memory got a generational upgrade too. Each GPU carries 288 GB of HBM4 memory (the latest high-bandwidth memory stacked directly on the chip) with 22 TB/s bandwidth — 2.75× faster than Blackwell’s 8 TB/s. The NVLink 6 interconnect (the high-speed data highway between GPUs) doubles to 3.6 TB/s per GPU, giving the full rack 260 TB/s of total scale-up bandwidth.

Seven Chips Designed as One System

What makes Vera Rubin different from a typical GPU upgrade is the vertical integration. Nvidia designed all seven chips — Rubin GPU, Vera CPU, NVLink 6 Switch, ConnectX-9 SuperNIC (a smart network card that handles data traffic), BlueField-4 DPU (a dedicated processor for data processing and security), Spectrum-6 Ethernet Switch, and Groq 3 LPU (a specialized chip for language processing) — to work as a single unified system rather than individual components bolted together.

The Vera CPU alone packs 88 custom “Olympus” cores (based on Arm architecture) with 176 threads, 162 MB of L3 cache, and 1.2 TB/s memory bandwidth. That’s a significant step up from the current Grace CPU’s 72 cores and 512 GB/s bandwidth — a 50% improvement in raw throughput.

The BlueField-4 DPU doubles its network bandwidth to 800 Gb/s with 64 Arm cores — up from 16 cores in the previous generation. And the Spectrum-6 switch pushes 102.4 Tb/s (terabits per second) using silicon photonics (light-based data transfer instead of electrical signals), achieving 5× better power efficiency than traditional designs.

One surprisingly practical innovation: the cableless compute tray design. Using specialized Amphenol PaladinHD2 connectors, rack assembly time drops from 2 hours to just 5 minutes. When each rack contains 1.3 million components, that bottleneck matters.

Nvidia Vera Rubin chip architecture and rack design

The Power Problem Nobody Can Ignore

Here’s where the trade-offs get real. Each Rubin GPU has a TDP (thermal design power — the maximum heat it generates under full load) of 2,300 watts in Max-P mode, or 1,800W in the more efficient Max-Q configuration. For comparison, Blackwell chips draw 1,000–1,400W.

A full NVL72 rack demands 180–220 kilowatts — roughly the electricity consumption of 40–80 average homes. Blackwell racks need 120–140 kW. And it gets wilder: the upcoming NVL576 variant (scheduled for late 2027 with Rubin Ultra chips) will consume 600 kW per rack — equivalent to powering approximately 400 homes.

100% liquid cooling is now mandatory. There is no air-cooling option. Most existing data centers are designed for 40 kW racks at most, meaning facility-level redesigns are unavoidable. Early adopters will likely cluster near abundant, cheap power — think Scandinavia, Quebec, and the UAE — while older data centers face potential obsolescence.

Nvidia argues the math still works: despite the raw power increase, Vera Rubin delivers 10× more inference throughput per watt than Blackwell. If you’re running the same AI workload, you’d need far fewer racks. The DSX Max-Q configuration even promises 30% more AI compute within fixed power budgets.

Who’s Already Signed On — and Why It Matters

The partner list reads like a who’s-who of AI. On the AI lab side: OpenAI, Anthropic, Meta, and Mistral AI are all confirmed. Sam Altman (OpenAI’s CEO) said the platform will let them “run more powerful models and agents at massive scale.” Dario Amodei (Anthropic’s CEO) highlighted the compute and system design for advancing safety.

Cloud availability spans AWS, Google Cloud, Microsoft Azure, and Oracle Cloud, plus CoreWeave, Crusoe, Lambda, Nebius, Nscale, and Together AI. System manufacturers include Cisco, Dell, HPE, Lenovo, Supermicro, and 9 additional ODMs (original design manufacturers — companies that build hardware to Nvidia’s specifications). Over 200 data center infrastructure partners are involved.

This unanimous industry support signals something critical: despite the staggering costs, there is currently no viable alternative at this scale. AMD’s MI450X offers partial competition but lacks Nvidia’s integrated networking and security ecosystem. Google’s TPUs achieve similar rack-scale integration but remain internally deployed, not commercially available to customers.

What You’ll Pay — and What to Watch For

Estimated rack pricing sits around $4 million, according to Baltimore Chronicle reporting. Nvidia claims a 30×–50× ROI, suggesting $100 million in hardware could generate $5 billion in revenue — an aggressive projection assuming high utilization rates.

Several caveats worth noting: all performance claims are Nvidia’s own — no independent MLPerf benchmarks (the industry’s standard performance test) exist yet. ML engineers have expressed skepticism about whether the claimed 50 PFLOPS with adaptive sparsity (a technique that skips unnecessary calculations) will match real-world workloads. The massive gains concentrate in FP4/FP8 precision (lower-accuracy number formats); FP16 performance only scales 1.6×.

HBM4 memory supply is also a concern. Reports suggest Micron is “effectively out of the picture” for Rubin’s memory production, and initial bandwidth may land closer to 20 TB/s than the claimed 22 TB/s.

Availability through partners begins in the second half of 2026. And Nvidia’s Feynman architecture is already planned for 2028 — signaling this is an ongoing infrastructure treadmill, not a one-time upgrade.

For most people who use AI tools daily, the practical takeaway is clear: services like ChatGPT, Claude, and other AI assistants should become faster and cheaper to run over the next 12–18 months as cloud providers deploy this hardware. If you’re evaluating AI infrastructure investments, the power and cooling requirements deserve as much attention as the performance numbers.

Related Content — Get Started | Guides | More News

Sources

Stay updated on AI news

Simple explanations of the latest AI developments