2026-03-19AI voicetext-to-speechopen sourceFish Speechvoice cloningGitHub

Fish Speech gives AI voices real emotion — 28K GitHub stars

Fish Speech is a free, open-source text-to-speech tool with 28K GitHub stars that lets you control tone, emotion, and style in 80+ languages — rivaling paid tools like ElevenLabs.

What if your AI-generated voiceover could whisper, laugh, or sound genuinely excited — not just read words in a flat robotic tone? Fish Speech, an open-source text-to-speech tool with 28,000+ GitHub stars, just got a major new web interface update — and it's trending as a serious free alternative to paid services like ElevenLabs.

Fish Speech S2 Pro benchmark results showing performance across languages

AI voices that actually sound human

Most text-to-speech tools give you a voice and maybe a speed slider. Fish Speech goes much further — it lets you insert emotional control tags directly into your text. Want the AI to whisper one sentence and shout the next? You just type [whisper] or [excited] before the words.

The system supports over 15,000 emotional tags — from [professional broadcast tone] to [sad, trembling voice]. This kind of fine-grained control is what separates a robotic reading from a natural performance.

Key numbers at a glance:

28,000+ GitHub stars, trending with +2,600 this week
80+ languages supported — English, Chinese, Japanese, Korean, Arabic, German, French, and many more
2 million+ voices available in the community library
0.99% word error rate on English benchmarks (lower is better)
~100ms time-to-first-audio — near-instant playback

Clone any voice from a 10-second sample

Fish Speech can clone a voice from just 10–30 seconds of audio. Upload a short clip of someone speaking, and the AI learns their tone, rhythm, and vocal style. Combined with the emotional tags, this means you can generate speech that sounds like a specific person — whispering, laughing, or reading with professional broadcaster polish.

The system also handles multi-speaker conversations. You can tag different speakers in a single text block, and Fish Speech generates a natural-sounding dialogue between them — useful for podcast drafts, audiobook narration, or game dialogue.

Fish Speech chat template showing multi-speaker dialogue generation

How it stacks up against ElevenLabs

ElevenLabs is the most popular paid TTS service, and Fish Speech is positioning itself as a free, open-source competitor. Here's how they compare:

Fish Speech — Free & open source, 80+ languages, 15,000+ emotion tags, self-hostable, voice cloning from 10s audio
ElevenLabs — Paid (starts $5/mo), 32 languages, simpler emotion controls, cloud-only, polished UI
Chatterbox — Free & open source, English-focused, strong voice cloning, less emotion control

Fish Speech's biggest advantage: you can run it entirely on your own machine. Your audio data never leaves your computer. For businesses handling sensitive content — legal recordings, medical notes, internal communications — this is a major plus.

Who should try this

Content creators: Generate voiceovers for YouTube videos, podcasts, or TikTok in minutes instead of hours. The emotional control means your voiceover actually matches the mood of your content.

Marketers and advertisers: Produce multilingual ad voiceovers without hiring voice actors for each language. Test 10 different reads of a tagline before committing.

Game developers and writers: Prototype character dialogue with distinct voices and emotions. Hear how your script sounds before recording with real actors.

Anyone building AI apps: The REST API lets you add natural-sounding speech to any application — chatbots, accessibility tools, language learning apps.

Try it yourself

The fastest way to get started is with Docker:

git clone https://github.com/fishaudio/fish-speech.git
cd fish-speech
docker compose --profile webui up

This launches a local web interface at http://localhost:7860 where you can type text, add emotion tags, and generate speech — no coding required.

Or if you just want to try it without installing anything, fish.audio offers a free web playground with a library of 2 million+ community voices — including celebrity and character voices.

Fish Audio voice card showing narrator voice style

The bigger picture

Text-to-speech is rapidly becoming a commodity — but emotional control is the next frontier. Flat AI voices are everywhere; the tools that win will be the ones that make AI sound genuinely human. Fish Speech's approach of embedding emotion at the word level, combined with being completely free and open source, makes it one of the most capable TTS tools available to anyone right now.

The project just added a brand-new web UI today (March 19), making it significantly easier for non-developers to use. With the GitHub repo trending at +2,600 stars this week, expect the community and voice library to grow fast.

Related Content — Get Started with Easy Claude Code | Free Learning Guides | More AI News

Sources

Stay updated on AI news

Simple explanations of the latest AI developments