2026-03-20MozillaLlamafilelocal AIprivacyAI models

Mozilla just rebuilt Llamafile — now one file runs text, image, and speech AI

Llamafile 0.10 bundles text chat, image generation, and speech-to-text into a single executable. Download one file, double-click, run AI locally.

Mozilla just shipped the biggest update to Llamafile since the project launched — and it turns a single downloaded file into a private AI powerhouse that handles text conversations, image generation, and speech-to-text, all running on your own computer with zero cloud dependency.

Llamafile (23,800 GitHub stars) is Mozilla's answer to a simple question: why is running AI locally still so complicated? The tool packages an entire AI model into a single executable file that works across macOS, Windows, Linux, FreeBSD, and more — no Python, no Docker, no terminal wizardry required.

Three AI Engines in One File

Version 0.10 is a ground-up rebuild with a new build system that now bundles three separate AI capabilities:

Text chat (llama.cpp) — Converse with AI models like Qwen, DeepSeek, Gemma, and others. The updated llama.cpp engine supports the latest models released in 2026.

Speech-to-text (Whisper.cpp) — Transcribe audio and translate between languages, all offline. Now integrated as a built-in module rather than a separate download.

Image generation (Stable Diffusion) — Generate images from text prompts, running entirely on your machine. Also integrated as a built-in module.

Before this update, you needed separate tools for each of these. Now it's one download.

GPU Acceleration That Actually Works

Two major hardware improvements make version 0.10 dramatically more practical:

macOS Metal support now works automatically. If you have a Mac with Apple Silicon (M1, M2, M3, M4), Llamafile will use your GPU without any configuration. Previous versions required manual setup.

NVIDIA CUDA support is back. After being unavailable in recent versions, GPU acceleration for NVIDIA graphics cards has been restored. This means models that took minutes on CPU alone can now run in seconds.

Mozilla.ai loves Llamafile — the project that packages AI models into single files

Three Ways to Talk to AI

Version 0.10 introduces a hybrid chat/server mode — a colorful terminal interface (TUI) that lets you chat with the AI directly in your terminal while simultaneously running a web server for browser access. Plus a new CLI mode for one-shot questions:

# Download a small model (under 1 GB)
curl -LO https://huggingface.co/mozilla-ai/llamafile_0.10.0/resolve/main/Qwen3.5-0.8B-Q8_0.llamafile

# Make it executable
chmod +x Qwen3.5-0.8B-Q8_0.llamafile

# Run it — that's it
./Qwen3.5-0.8B-Q8_0.llamafile

Three commands. No installation. No accounts. No data leaving your machine.

Llamafile terminal chat interface with colorful syntax highlighting

Who Is This For?

Privacy-conscious professionals: Lawyers, doctors, accountants, and anyone handling sensitive data can now use AI without sending a single word to the cloud. Every conversation stays on your hard drive.

People in areas with poor internet: Llamafile works completely offline. Download the file once, and you have AI access forever — no subscription, no internet connection needed.

Curious beginners: If you've wanted to try running AI locally but were intimidated by Python environments and terminal commands, this is the simplest path. Download, double-click (on many systems), and start chatting.

Developers building local-first apps: The built-in server mode means you can point your applications at localhost:8080 and get an API compatible with existing AI tools — no external services required.

The Trade-offs

Mozilla is transparent that version 0.10 is a major architectural shift. The new build system aligns better with the latest llama.cpp releases (meaning faster support for new models), but some features from the 0.9.x series may not be available yet. If you rely on a specific feature, Mozilla recommends checking the release notes or sticking with 0.9.x until your workflow is fully supported.

The project runs under the Apache 2.0 license — free for personal and commercial use. Five contributors shipped this release, including three first-time contributors to the project.

Supported platforms: macOS, Windows, Linux, FreeBSD, OpenBSD, NetBSD

CPU architectures: AMD64 (Intel/AMD) and ARM64 (Apple Silicon, Raspberry Pi)

GPU acceleration: NVIDIA CUDA, Apple Metal (automatic)

License: Apache 2.0 (free)

The full project is on GitHub, and documentation lives at mozilla-ai.github.io/llamafile.

Related Content — Get Started with Easy Claude Code | Free Learning Guides | More AI News

Sources

Stay updated on AI news

Simple explanations of the latest AI developments