2026-05-19ollamallama-cpplocal-llmcodex-appgithub-copilot-alternativelocal-aiai-codingapple-silicon

Ollama v0.30: Native llama.cpp + Free Codex App

Ollama v0.30 replaces GGML with native llama.cpp and launches Codex App — a free, offline GitHub Copilot alternative that runs entirely on your own hardware.

Ollama v0.30 is the most significant architectural update in the project's history — replacing GGML with direct llama.cpp integration and shipping Codex App, a free offline code reviewer that eliminates GitHub Copilot and Cursor subscriptions entirely. Version 0.30 — currently in active pre-release — drops the GGML wrapper (a compatibility bridge that translated AI model files for older systems) in favor of direct llama.cpp integration (the open-source engine that powers nearly every local AI tool in 2026). That is not a minor version bump. It redefines what Ollama actually is: not just a model runner, but the primary distribution layer for llama.cpp on desktop hardware.

Six release candidates shipped in four days — rc15 through rc20 — as the team stress-tests the new architecture. That pace, combined with the 607 Hacker News upvotes the Python/JavaScript client library announcement generated, signals a project under serious community pressure to ship and stabilize.

Why Dropping GGML Is the Biggest Change Ollama Has Ever Made

GGML was the original performance library (a specialized math toolkit for running AI model calculations efficiently on consumer hardware) that llama.cpp was built on top of. Over time, llama.cpp outgrew it — the community standardized on GGUF (the file format used by virtually every modern model download on Hugging Face and other repositories), and GGML-based systems had to translate between formats in real time, adding overhead every time you loaded a model.

Ollama's previous architecture wrapped llama.cpp inside a GGML compatibility layer. Every model you ran went through that translation step. In v0.30, the wrapper is gone. What changes in practice:

GGUF files load natively — no conversion or compatibility workaround required at startup
llama.cpp updates arrive faster — Ollama no longer has to re-wrap every new llama.cpp feature before shipping it
Apple Silicon gets dedicated acceleration — MLX (Apple's open-source machine learning framework for M-series chips) now handles inference directly with a reworked sampler that improves text generation quality
Direct competition with LM Studio — LM Studio (a GUI-based local AI model manager, free download) was the primary alternative for users who needed native GGUF compatibility; Ollama v0.30 closes that gap while staying 100% command-line and open-source

Two models are confirmed unsupported in the current pre-release: laguna-xs.2 and llama3.2-vision. The release notes explicitly ask for reports on "performance improvements or degradation" — the team is still benchmarking before the stable release. If you depend on either model, hold at v0.24 for now.

Codex App: A Free, Offline Alternative to GitHub Copilot

Ollama v0.30 Codex App — free offline GitHub Copilot alternative with inline AI code editor, browser preview, and annotation panel

Shipped in v0.24 and improving with each release, Codex App is the most consumer-facing addition to Ollama in its history. GitHub Copilot costs $10/month. Cursor (an AI-native code editor with deep codebase awareness) costs $20/month. Codex App costs nothing, runs entirely on your own machine, and sends zero code to external servers.

Three features define what it can actually do:

Ollama Codex App: Built-In Browser with Inline Annotations

Codex App loads your local development server — for example, a React or Flask app running at localhost:3000 — directly inside the interface. You can click on any element in the running app and leave an annotation (a written note or instruction) that the AI model reads as context when generating or reviewing code. This eliminates the copy-paste loop between "what I see in the browser" and "what I need the AI to fix."

Ollama Codex App review mode — structured AI code comments and iteration panel for local llama.cpp-powered offline code review

Codex App Review Mode: AI-Powered Code Analysis

Review mode reads your code and generates structured comments — similar to a pull request (a process where developers submit code changes for team feedback before merging) review from a colleague. You can iterate directly inside the interface without switching between your editor, a browser tab, and a separate AI chat window. No code leaves your machine at any point in that workflow.

Codex App Restore Function

If a Codex App update breaks your configuration, one command reverts everything to a working state:

ollama launch codex-app --restore

Five models are officially recommended for use with Codex App: kimi-k2.6, glm-5.1, nemotron-3-super, gemma4:31b, and qwen3.6. On a modern laptop with 16GB of RAM or more, any of these runs a full code-review workflow with no cloud subscription. You can explore local AI automation guides to match the right model to your hardware before installing.

How to Install Ollama v0.30 Pre-Release Right Now

The pre-release installs with a single command that replaces your existing Ollama version. Skip this if you use laguna-xs.2 or llama3.2-vision — both are unsupported in v0.30 pre-release and will not work until the stable release fixes compatibility.

Mac or Linux:

curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.30.0-rc20 sh

Windows (PowerShell):

$env:OLLAMA_VERSION="0.30.0-rc20"; irm https://ollama.com/install.ps1 | iex

The Ollama team is actively collecting feedback — specifically on performance regressions and compatibility errors that did not exist in v0.24. If you test it, report issues directly on the Ollama releases page.

What Ollama v0.30 Means for the Local AI Automation Market in 2026

The 607-upvote Hacker News response to Ollama's Python and JavaScript client libraries (packages that let developers control Ollama from within their own applications, not just via the command line) tells you where the project is heading. Teams are building AI automation products on top of Ollama — customer support bots, internal search tools, automated code reviewers — and they need a stable, programmable foundation to build on.

The llama.cpp rewrite delivers exactly that. By eliminating the GGML translation layer, Ollama becomes the kind of low-level infrastructure that other software can depend on reliably. Meanwhile, the Codex App opens a direct attack on the $400M+ AI coding subscription market. Cursor and Copilot charge monthly for features that Ollama now offers for free — running on hardware developers already own.

The stable v0.30 release is the one to actually deploy at scale. Six release candidates in four days is either disciplined iteration or a sign that the architectural rewrite revealed unexpected complexity. The smart move right now: install the pre-release on a non-production machine, try Codex App with one of the five recommended models, and file any issues you hit. By the time stable ships, you will already know whether it fits your workflow — and whether it is worth canceling whatever subscription you are currently running.

Related Content — Get Started with Local AI | Guides | More News

Sources

Stay updated on AI news

Simple explanations of the latest AI developments