2026-04-21ollamahermes-agentlocal-ailocal-llmapple-silicongemma-4ai-automationgithub-copilot

Ollama v0.21.0: Hermes Agent — Free Local AI for Mac

Ollama v0.21.0: Hermes Agent is a free local AI for Mac that learns your workflows. Gemma 4 on Apple Silicon + GitHub Copilot CLI in one command.

Ollama just shipped version 0.21.0, and the headline feature isn't a model update — it's Hermes Agent, a local AI automation tool that watches your work, learns your patterns, and builds its own reusable skills automatically. It runs entirely on your Mac (Apple M-series chips), costs nothing, and requires no subscription. That last part matters: GitHub Copilot starts at $10/month for individuals. Hermes runs locally for $0.

The update also brings native Gemma 4 support through Apple's MLX framework (Apple's hardware acceleration layer for M-chip devices), a one-command GitHub Copilot CLI launcher, and a sweep of macOS compiler bug fixes that had made v0.20.x builds unreliable.

Hermes Agent: Ollama's Self-Learning AI Automation Tool

Standard AI assistants answer one question and forget the conversation the moment you close the window. Hermes Agent works differently. It observes how you use it — what types of requests you make, what output formats you prefer, what tasks you repeat — and creates reusable skills from those patterns.

Hermes Agent self-learning AI automation interface in Ollama v0.21.0 for Mac

The official description: "Hermes learns with you, automatically creating skills to better serve your workflows. Great for research and engineering tasks."

A concrete example: if you repeatedly ask Hermes to summarize GitHub pull requests in a specific format — bullet points, grouped by risk level, with a one-line verdict at the top — it learns that template and turns it into a named skill you can trigger by reference, not by re-typing the whole prompt. That's the difference between a search engine and an assistant who knows your preferences.

Hermes is aimed specifically at research and engineering workflows: literature reviews, code summarization, document analysis, automated report generation. The scope of learnable skill categories isn't fully documented yet, but this represents a clear shift in what Ollama is trying to be — moving from "local model runner" toward "AI co-worker that adapts to you."

Starting Hermes takes one command:

ollama launch hermes

Gemma 4 Runs Natively on Apple Silicon — No GPU Rental Required

Google's Gemma 4 models are now fully supported through Ollama's MLX backend (MLX is Apple's open-source machine learning framework, optimized for the unified memory architecture of M1/M2/M3/M4 chips). The practical result: Gemma 4 inference runs on your Mac's Neural Engine and GPU directly — no NVIDIA hardware, no cloud API key, no monthly compute bill.

Three specific additions make this work cleanly:

Full Gemma 4 MLX support — Complete model inference through the MLX framework on Apple Silicon
Text-only MLX runtime variant — A lighter build that skips vision-processing overhead for text-only tasks, loads faster, uses less memory
Gemma 4 nothink renderer restored — Implements e2b-style (end-to-beginning) prompt templates that improve generation quality for instruction-following tasks

The MLX backend also gains mixed-precision quantization (a technique that stores model weights in compressed number formats — cutting memory use by 50–75% with minimal accuracy loss) and 6 new operation wrappers: Conv2d, Pad, activations, trigonometric functions, masked SDPA (scaled dot-product attention — the core mathematical operation that lets transformer models weigh which words matter most), and RoPE-with-freqs (rotary position embedding, which encodes word-order information so the model understands sentence structure).

MLX capability detection has also been improved, meaning Ollama now automatically selects the right hardware acceleration path on newer Apple chips without manual configuration.

For Linux users on AMD hardware: ROCm (AMD's GPU computing platform — the AMD equivalent of NVIDIA's CUDA) is updated to version 7.2.1, with better compatibility for recent AMD GPUs.

GitHub Copilot CLI Integration: One Command Away in Ollama

One of the more surprising additions in v0.21.0 is the ability to launch GitHub Copilot CLI directly through Ollama:

# Launch GitHub Copilot CLI via Ollama
ollama launch github-copilot

# Launch Hermes Agent (self-learning local agent)
ollama launch hermes

# Skip interactive prompts in scripts
ollama launch openclaw --yes

This bridges local model inference with a widely-used cloud developer tool. Previously, setting up GitHub Copilot CLI required its own separate authentication, configuration files, and environment setup — entirely outside the Ollama ecosystem. With this integration, developers can switch between fully local models and Copilot from a single command interface, in the same workflow, without context-switching between tools.

GitHub Copilot pricing: $10/month for individual developers, $19/month per user for businesses. Teams using Ollama for most tasks — with Copilot as a fallback — could significantly reduce per-seat costs while maintaining access to cloud AI when needed.

Two launch improvements make the command more reliable in production use:

The --yes flag in ollama launch openclaw --yes now correctly skips all interactive prompts (previously ignored silently — a bug that broke automation scripts)
The launcher skips unnecessary config rewrites when settings haven't changed, reducing startup latency

macOS Stability: The Fixes Behind the Scenes

Version 0.20.x accumulated stability debt, particularly on macOS. The v0.21.0 changelog addresses these systematically:

3+ Metal compiler bugs resolved — Metal is Apple's low-level graphics and compute API; compiler errors here caused incorrect model outputs or outright crashes on some Mac configurations
macOS cross-compile builds stabilized — Building Ollama for a different Mac architecture (e.g., building arm64 on an x86 Mac) no longer triggers unnecessary code generation passes that added build time and introduced errors
CGO build warnings suppressed — CGO is Go's system for calling C code; deprecated API warnings were cluttering build output and masking real errors in CI pipelines
OpenCode configuration moved inline — Previously required separate configuration files; now lives in a single inline config, simplifying project setup
Improved MLX hardware detection — Ollama now correctly identifies the available acceleration capabilities on newer Apple chips, picking the fastest available compute path automatically

The volume of macOS-specific fixes signals something important: Ollama's development team is hardening the project for production use, not just hobbyist experimentation. Three Metal compiler bugs in a single point release suggests active testing on real hardware under real workloads.

Should You Upgrade to Ollama v0.21.0 Now?

If you're already running Ollama, updating is straightforward — download from ollama.com or pull via your package manager. New to Ollama? Our local AI automation setup guide gets you running in minutes. Existing models and configurations carry over.

The features most worth testing immediately, ranked by practical impact:

Hermes Agent — Run ollama launch hermes and use it for 2–3 days of your normal research or coding work. The skill-learning behavior becomes visible over time, not in a single session.
Gemma 4 on Apple Silicon — If you're on an M-series Mac and haven't run Gemma 4 before, the MLX backend makes this the fastest path to local multimodal inference without cloud costs.
GitHub Copilot integration — Useful if you're already a Copilot subscriber and want to centralize your AI tooling under one launch command.

Ollama began as a simple wrapper for running open-source models locally. Version 0.21.0 is the clearest indication yet that the project has a more ambitious destination: local AI infrastructure for automation that learns, adapts, and integrates with professional tools — without requiring a cloud account, a credit card, or a GPU rental. If your AI tool costs are rising month over month, this release is worth a few hours of your time before the next billing cycle hits.

Related Content — Ollama Setup Guide | AI Automation Guides | More AI News

Sources

Stay updated on AI news

Simple explanations of the latest AI developments