AI for Automation
Back to AI News
2026-03-20AI voicetext-to-speechKittenTTSedge AIAI privacy

This 25 MB AI gives any device a human voice

KittenTTS fits a realistic text-to-speech AI into just 25 megabytes. It runs on any CPU — no GPU, no cloud, no internet needed.


KittenTTS just hit 11,400 GitHub stars and the front page of Hacker News with 361 comments — and for good reason. It's a text-to-speech AI that fits into 25 megabytes and runs entirely on your computer's regular processor. No graphics card. No cloud connection. No internet required.

KittenTTS Server Web UI showing voice synthesis interface

Smaller than a photo, smarter than you'd expect

Most AI voice models need gigabytes of storage and expensive graphics cards to run. KittenTTS flips that equation. The team at KittenML built three model sizes:

🔹 Mini — 80 million parameters (a measure of the AI's complexity), 80 MB on disk. Best quality.

🔹 Micro — 40 million parameters, 41 MB. The balanced option.

🔹 Nano — 15 million parameters, just 25 MB. Runs on a Raspberry Pi.

For comparison, popular voice AI models like Bark or XTTS-v2 are typically 1–4 GB and need a dedicated GPU. KittenTTS Nano is smaller than most smartphone photos.

Eight voices, instant results

The model ships with 8 built-in voices — Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, and Leo. On a modern laptop processor, it generates speech at 5x real-time speed (a 10-second clip takes about 2 seconds to create). Even on a budget dual-core processor, it stays under 2 seconds for standard sentences.

The audio comes out at 24 kHz (CD-quality) with adjustable speed controls. You can go from normal pace to fast narration with a single parameter change.

KittenTTS Server light mode interface with waveform visualization

Who this is for

App developers: Add voice to any app without paying per-word API fees. The Apache 2.0 license means commercial use is free.

Hardware tinkerers: Run voice synthesis on a Raspberry Pi, IoT device, or any embedded system. No GPU, no cloud bills.

Privacy-conscious users: Your text never leaves your device. No data sent to any server, ever.

Content creators: Generate voiceovers for videos, podcasts, or presentations without subscription fees.

The honest trade-offs

Hacker News commenters noted some real limitations. The voices can sound "over-acted" or artificial compared to larger models — one commenter compared them to "anime dubs." This is an early preview trained on less than 10% of the final dataset, and the team promises quality improvements in coming releases.

There's also a licensing wrinkle: the code depends on a component (espeak-ng) that uses a stricter license (GPL-3.0), which may limit some commercial uses until the team releases a standalone version they've promised.

Try it in 3 lines of Python

pip install https://github.com/KittenML/KittenTTS/releases/download/0.8.1/kittentts-0.8.1-py3-none-any.whl

from kittentts import KittenTTS
import soundfile as sf

model = KittenTTS("KittenML/kitten-tts-nano-0.8-int8")
audio = model.generate("Hello world!", voice="Luna")
sf.write("output.wav", audio, 24000)

Or try it directly in your browser with the community web demo that runs entirely client-side using WebAssembly — no installation needed.

For a full server with a visual interface, the community-built Kitten-TTS-Server adds a Web UI with waveform visualization, GPU acceleration support, and an OpenAI-compatible API.

Related ContentGet Started with Easy Claude Code | Free Learning Guides | More AI News

Stay updated on AI news

Simple explanations of the latest AI developments