2026-03-17ElevenLabsAI Speech SynthesisTTSEleven v3Multilingual AIContent Creation

AI Voices Can Now Laugh, Whisper, and Express Emotions

ElevenLabs' latest speech synthesis model, Eleven v3, has graduated from alpha and officially launched. While the previous v2 supported 29 languages, v3 now supports over 70 languages, featuring natural multi-speaker conversations and audio tags that control whispers, laughter, and emotional expression...

TL;DR: AI-generated voices can now laugh, whisper, and express anger just like real people. ElevenLabs' latest voice model has completed 9 months of testing and is now officially available to everyone.

ElevenLabs' AI speech synthesis model Eleven v3 has reached General Availability (GA). First unveiled as an alpha version in June 2025, it underwent approximately 9 months of testing and is now available to all users, including those on free accounts.

From 29 to 70 — More Than Double the Language Support

The most notable change is language support. The previous model (Multilingual v2) supported 29 languages, but v3 supports over 70 languages. Coverage extends broadly from Korean to Southeast Asian, African, and Middle Eastern languages.

In practical terms, this means significantly more options when adding AI dubbing to YouTube videos or creating narration for globally targeted advertisements.

Audio Tags — Giving 'Acting Directions' to AI Voices

The most innovative feature of v3 is Audio Tags. By inserting special tags into text, you can precisely control the emotion, tone, and speaking pace of the AI voice.

Audio Tag Examples

• Whispering tone → The AI voice actually speaks in a whisper

• Speaking while laughing → Natural laughter is woven into the dialogue

• Emotional control for angry, sad, excited tones, and more

• Dramatic delivery — Effective for audiobooks, game characters, and ad narration

Previously, even when you requested "say this sadly," the AI would often just read it in a flat, neutral tone. With v3, it's like giving "pause here, then speak in a low, whispering voice" acting directions to a voice actor — enabling fine-grained control.

Multi-Speaker Conversations in a Single Take

v3 introduces a multi-speaker conversation feature. For example, when creating audio of two people talking like a podcast, you previously had to generate each speaker separately and then stitch them together in an editing program.

Now, with the Text-to-Dialogue feature, you simply input a single script and the AI distinguishes each speaker's voice to produce a natural conversation. There's no need to individually clone each voice either.

Content creation screen using Eleven v3 voices in ElevenLabs Flows — showing narration generated with the Mark v3 voice

v2 vs. v3 — Which Should You Use?

v3 isn't superior in every way. The best choice depends on your use case.

When v3 is the better choice

• Content where emotional expression matters, such as audiobooks, games, and animation

• Formats with multiple speakers in conversation, like podcasts

• When you need languages not supported in v2, such as Southeast Asian or African languages

When sticking with v2 makes more sense

• When you need a consistent, stable tone for corporate training or presentations

• When generating long audio over 10 minutes in one go (v3 maxes out at ~5 min, v2 at ~10 min)

• When cost matters — v2 is cheaper per character

Who Can Use It and Where

YouTubers & Podcasters: Simply write a script and instantly create audio content featuring multiple speakers. You can produce interview-style content without needing to book guests.

Game & App Developers: Add emotion to NPC (non-player character) dialogue for greater immersion. Control whispers, shouts, laughter, and more with simple text tags.

Global Businesses: Create product video narrations in over 70 languages. Dramatically reduce the cost and time of hiring separate voice actors for each language.

Educational Content Creators: Use natural AI voices for online courses while shifting tone at key moments to capture learners' attention.

How to Get Started

Create a free account on the ElevenLabs website, select Eleven v3 as your model in Text to Speech, and you're ready to go. Free accounts receive a monthly credit allowance.

Developers can also access it via API. Python and JavaScript SDKs are officially supported.

# Generate Eleven v3 speech with Python
pip install elevenlabs

from elevenlabs import ElevenLabs
client = ElevenLabs(api_key="your-api-key")
audio = client.text_to_speech.convert(
    text="Hello, this is Eleven v3.",
    voice_id="your-voice-id",
    model_id="eleven_v3"
)

The Voice AI Market Landscape

In February 2026, ElevenLabs raised $500 million from Sequoia Capital and others, achieving a valuation of $11 billion. While competing with OpenAI, Google, and Amazon in the speech synthesis market, ElevenLabs is widely regarded as the current leader in emotional expression and multilingual support.

The audio tag feature in v3, in particular, is a differentiator that's hard to find in competing services. It's an update that takes another step toward breaking the stereotype that "AI-generated voices sound robotic."

If you'd like to learn more about AI and vibe coding, check out our Free Learning Guide.

Related Content — More AI News | Free Learning Guide

Sources

Stay updated on AI news

Simple explanations of the latest AI developments