AI for Automation
Back to AI News
2026-04-10Ollamalocal AIAI automationWhatsApp AIDiscord AIGemma 4open source AIfree AI tools

Ollama v0.20.5: Local AI for WhatsApp & Discord — Free

Ollama v0.20.5 adds OpenClaw to run local AI inside WhatsApp, Telegram & Discord. No subscription, no cloud — your messages stay on your device.


On April 9, 2026, Ollama — the free tool that lets you run AI entirely on your own computer — shipped version 0.20.5 with a feature that changes how local AI fits into your day: OpenClaw, a bridge that connects your private AI directly to WhatsApp, Telegram, Discord, and other messaging platforms. One command. No monthly subscription. No data sent to any cloud server.

Until now, running a local AI (a model that processes everything on your machine instead of sending your words to OpenAI or Anthropic) meant opening a separate terminal window or web interface. OpenClaw eliminates that friction entirely — your AI lives inside the chat apps you already have open, all day, every day.

Your Chat Apps Just Got a Private Local AI Brain

OpenClaw is a channel integration layer (software that acts as a translator between your local AI and external messaging platforms). The moment you run the setup command, it builds a bridge so you can message your local AI model through the apps already on your phone or desktop — no extra software to install, no new interface to learn.

# Set up OpenClaw and connect your AI to messaging apps
ollama launch openclaw

# View available integration options
ollama launch openclaw --help

Supported platforms as of v0.20.5:

  • WhatsApp — the world's most-used messaging app, with over 2 billion active users
  • Telegram — popular among privacy-focused users and developer communities
  • Discord — the dominant platform in gaming, developer, and creator communities
  • Additional channels — OpenClaw is designed to expand beyond these 4 initial integrations
OpenClaw local AI channel setup in Ollama v0.20.5 — WhatsApp, Telegram, and Discord integration

The practical upside: instead of switching between tabs to ask your AI a question, you stay in your existing workflow. Because Ollama runs everything locally — on your own hardware, not a remote server — your conversations never leave your device. No terms-of-service surprises. No subscription creep. No company training on your private messages.

For developers who've been asking how to build a Discord bot, WhatsApp AI assistant, or local AI automation workflow without paying $20/month for cloud API access — including those doing vibe coding with Claude Code via Ollama — this is a direct answer. For everyone else, it's the beginning of AI that fits into your day instead of interrupting it. If you're new to Ollama, the beginner's setup guide walks you through installation in under 10 minutes.

Gemma 4 Finally Stops Breaking Your Automations

Alongside OpenClaw, v0.20.5 includes 2 significant upgrades for users running Gemma 4 — Google's latest open-source AI model that you can download and run entirely for free on most modern hardware. Gemma 4 has been powerful but frustrating in automated setups. This update directly addresses both problems.

Flash Attention for Gemma 4: Faster on Compatible GPUs

Flash attention (a mathematical shortcut that lets AI models process long conversations much faster while using significantly less GPU memory) is now enabled for Gemma 4 on compatible hardware. The requirement is CUDA version 7.5 or higher — a graphics driver capability level that covers most Nvidia GPUs manufactured after 2012, including the GTX 900 series, RTX 20/30/40-series, and professional cards like the A100 and H100.

Ollama's release notes don't publish specific speed numbers, but flash attention typically cuts GPU memory usage by 2–4x and reduces response latency noticeably on conversations longer than 2,000 words. If you've been using Gemma 4 for long research sessions and found it sluggish or prone to running out of memory, this update targets your setup directly. Verify your CUDA version by running nvidia-smi in a terminal — look for "CUDA Version" in the top row of the output.

Tool Call Repair: Fixing AI Automation Workflow Crashes

Tool calling (the ability for an AI model to trigger external actions — like searching the web, querying a database, or running a calculation — rather than simply replying in text) is where Gemma 4 has been most unreliable. The model frequently produced malformed output that caused agent pipelines (automated workflows where the AI takes a sequence of actions on your behalf) to crash mid-task.

Ollama v0.20.5 adds an automatic repair system that intercepts Gemma 4's 4 most common tool-call failures before they crash your workflow:

  • Missing string delimiters — the model forgets closing quotation marks
  • Single-quoted values where double-quotes are required by the output format
  • Raw terminal strings that break downstream data parsers
  • Missing object closes — the model forgets to finish a JSON data block

The system attempts a best-effort fix, then falls back to a candidate pipeline (a secondary system that tries alternative interpretations of what the model intended) if the first repair fails. Gemma 4's tool calling is meaningfully improved — but not 100% reliable. Test any production automation before deploying it for critical tasks.

Ollama's Four Updates in Five Days — the Full Sprint

Ollama's team shipped 3 stable versions between April 4 and April 9, 2026 — an unusual pace that signals active pressure to stabilize Gemma 4 support and expand the platform's reach. Here's what changed across the full sprint:

  • v0.20.2 (April 4) — Changed the app default view from "launch" to "new chat," reducing friction for first-time users opening the Ollama desktop app
  • v0.20.3 (April 7) — Improved Gemma 4 tool calling, added the latest models to the Ollama App, fixed a bug in the OpenClaw TUI (terminal user interface — the text-based control panel for managing platform integrations)
  • v0.20.4 (April 8) — Improved MLX M5 performance using NAX optimization. MLX is Apple's machine learning framework for Mac; NAX is a low-level memory optimization that speeds up model inference on Apple Silicon chips like the M3 and M4
  • v0.20.5 (April 9) — OpenClaw channel setup, flash attention for Gemma 4, tool call repair system, and a fix for the /save command for safetensors (a standardized, more secure file format for storing AI model weights) architectures
Ollama GitHub open-source repository — free local AI runner for macOS, Linux, and Windows

The /save fix matters for users who maintain custom model configurations. If you've saved a modified version of a model and had the save silently fail, v0.20.5 resolves this. Your configurations will now persist correctly between sessions.

What to Know Before Switching to Ollama Local AI

A few honest limitations worth reading before you upgrade or try OpenClaw for the first time:

  • Flash attention requires CUDA 7.5+. Older Nvidia GPUs (anything before the GTX 900 series), AMD graphics cards, and CPU-only setups are excluded from this speed improvement. Run nvidia-smi in your terminal to confirm your CUDA version.
  • OpenClaw detects curl-based OpenCode installs at ~/.opencode/bin. If you installed OpenCode (a local AI coding assistant) via a package manager or a different directory, the automatic detection may miss it and require manual setup.
  • Tool call repair is best-effort, not guaranteed. Gemma 4 agent reliability is significantly better now, but test any automated workflow before relying on it for important tasks.
  • No published performance benchmarks. Ollama's release notes describe the changes but don't quantify speed gains. Real-world results will vary by GPU model, VRAM amount, and conversation length.

You can update Ollama free at ollama.com or by re-running the original installer — it replaces the existing version in place. The broader pattern here is worth watching: with each update, Ollama is becoming less of a developer-only tool and more of a general platform for private, subscription-free AI automation. The gap between paying $200/month for a cloud AI service and running your own model entirely on your hardware is getting harder to justify — especially now that your AI can meet you inside WhatsApp.

Related ContentGet Started | Guides | More News

Stay updated on AI news

Simple explanations of the latest AI developments