2026-03-28Mistralopen source AIAI modelsdeveloper toolsLLM

Mistral just dropped a free model that beats GPT-4o-mini

Mistral Small 4 is free (Apache 2.0) and beats GPT-4o-mini by 77% on grad-level science. 119B params, 256K context, reasoning+vision+code in one.

On March 16, 2026, Mistral AI released a model that does something none of its competitors have achieved at this price point: it combines high-level reasoning, image analysis, and code generation in a single package — and beats GPT-4o-mini on every major benchmark by a wide margin. It's called Mistral Small 4, it has 119 billion parameters, and it's completely free to use commercially.

The Apache 2.0 license means you can download it, modify it, build products with it, and sell those products — no usage fees, no API costs, no clauses that restrict commercial use. This puts it in the same legal category as Linux or the Android operating system.

At a glance:

• 119B total parameters (a parameter is one adjustable weight inside an AI model — more = more capable)
• Only 6.5B active parameters per token — runs at the speed and cost of a much smaller model
• 256,000-token context window — roughly 190,000 words, or a full novel, in a single session
• GPQA Diamond: 71.2% vs GPT-4o-mini's 40.2% — a 77% relative improvement
• MMLU-Pro: 78.0% vs GPT-4o-mini's 64.8%
• 40% faster end-to-end responses vs Mistral Small 3
• 3× higher throughput (more simultaneous requests handled) vs Small 3
• Release date: March 16, 2026

Why 119 Billion Parameters Doesn't Mean You Need a Supercomputer

Mistral Small 4 uses a Mixture of Experts (MoE) architecture — a design where the model is divided into 128 specialized "expert" sub-networks. For any given piece of text, only 4 of those 128 experts activate. The result: 119 billion total parameters, but only about 6.5 billion active at any one time during inference (the moment the AI is actually processing your request).

Think of it like a hospital with 128 specialists on staff. When a patient arrives, only the 4 most relevant specialists show up — you get expert-level care without mobilizing the entire hospital. The quality equivalent is a 119B model; the compute cost equivalent is a 6.5B model.

In practice, you can self-host Mistral Small 4 on 4× NVIDIA H100 80GB GPUs — hardware that's standard in cloud deployments but expensive for an individual. Via the Mistral API or NVIDIA NIM (a free prototyping platform), you can access it immediately with no hardware required.

Four Models Merged Into One — Reasoning, Vision, and Code

Mistral Small 4 is the first Mistral model to unify three previously separate products into a single deployment:

Magistral — Mistral's dedicated reasoning model for step-by-step logic, planning, and analysis
Pixtral — Mistral's vision model for understanding and describing images
Devstral — Mistral's coding-focused model for writing, reviewing, and debugging code

All three capabilities are now in one model. Instead of deciding which Mistral product to call and maintaining three different integrations, you call one endpoint and use the reasoning_effort parameter to control behavior: set it to "none" for instant answers, or "high" for deep step-by-step analysis. No separate reasoning model required — no extra cost, no extra configuration.

Notably, responses are 3.5–4× shorter than Qwen 3.5-122B (a comparable open-source model from Alibaba) while maintaining similar quality. Shorter responses mean lower token costs when using a pay-per-use API, and faster iteration in a development loop.

How to Run Mistral Small 4 Right Now

The fastest way to test it requires zero installation — open Mistral AI Studio in a browser, or go to NVIDIA NIM for a free prototyping tier with no setup at all.

For developers who want to self-host using vLLM (a high-performance framework for serving AI models in production environments):

# Pull the optimized Docker image
docker pull mistralllm/vllm-ms4:latest

# Serve the model (requires 4x H100 80GB GPUs)
vllm serve mistralai/Mistral-Small-4-119B-2603 \
  --max-model-len 262144 \
  --tensor-parallel-size 4 \
  --attention-backend FLASH_ATTN_MLA \
  --reasoning-parser mistral \
  --enable-auto-tool-choice

For Python developers using Hugging Face Transformers (the most popular Python library for running AI models locally):

uv pip install git+https://github.com/huggingface/transformers.git

The model is at huggingface.co/mistralai/Mistral-Small-4-119B-2603. Note: llama.cpp support (for running on consumer hardware like a gaming PC) is still in development as of this writing.

Who Should Use This — and When a Bigger Model Still Wins

Mistral Small 4 is a strong fit if you are:

An enterprise team running AI in-house who needs reasoning + vision + coding in one deployment without paying ongoing per-token API costs
A developer building a product on top of an open-source model who needs full commercial freedom to ship, modify, and distribute
A startup wanting to avoid long-term API dependency by self-hosting from day one
A researcher who needs a capable baseline that can be fine-tuned (customized on your own data) without license restrictions

For tasks at the absolute frontier — very long multi-step reasoning chains, complex autonomous agent workflows, nuanced creative writing — larger proprietary models like Claude Opus or GPT-4o still hold an edge. But for the majority of real production use cases, Mistral Small 4's performance-to-cost ratio is now the benchmark to beat in the open-source space. The official announcement is at mistral.ai/news/mistral-small-4.

Related Content — Get Started with Easy Claude Code | Free Learning Guides | More AI News

Sources

Stay updated on AI news

Simple explanations of the latest AI developments