AI for Automation
Back to AI News
2026-03-20AI videoLTX-2.3Lightricksvideo generationAI audiocreators

The first AI that generates 4K video with sound — free

LTX-2.3 by Lightricks generates synchronized 4K video and audio in a single pass. Open source, runs locally, and supports vertical video for social media.


Every other AI video generator makes you choose: get a silent clip, then find audio separately. LTX-2.3 by Lightricks eliminates that step entirely — it generates 4K video at 50 frames per second with perfectly synchronized sound, all from a single text prompt.

The model is completely open source under the Apache 2.0 license, has 5,000 GitHub stars, and can run on your own computer — no cloud subscription, no usage limits, no waiting in queues.

LTX 2.3 AI video generation with audio - overview of capabilities

Video and audio from one model — why that matters

Until now, creating AI-generated video with matching sound required at least two separate tools: one to generate the visuals, another to add audio. The results rarely matched. Footsteps wouldn't sync with walking, dialogue wouldn't match lip movements, ambient sounds would feel disconnected.

LTX-2.3 uses a 22-billion parameter Diffusion Transformer (DiT — a type of AI architecture that generates images and video by gradually refining noise into clear output) that produces both video and audio in a single forward pass. Sound effects, ambient noise, and dialogue are generated alongside the visuals, not bolted on afterward.

What you get with LTX-2.3

4K resolution at up to 50 fps — broadcast-quality output
Native vertical video (9:16) — trained on real portrait footage, not cropped landscapes
Synchronized audio — sound effects, ambient noise, dialogue
Up to 20-second clips — long enough for social media posts and ads
Seven generation modes — text-to-video, image-to-video, audio-to-video, extend, and retake
Local inference — runs on consumer hardware, no cloud needed

LTX 2.3 demo video output showing AI-generated video with synchronized audio

Runs on your laptop — not just data centers

Most cutting-edge AI video models require cloud GPUs that cost $2–5 per minute. LTX-2.3 offers multiple performance variants:

Full quality: Best results on a modern GPU with 24GB+ VRAM

FP8 quantized: Reduced memory footprint with minimal quality loss

Distilled variant: Faster generation for quick previews and iteration

The model also ships with a desktop video editor that lets you run the entire pipeline locally — including a LoRA trainer (a tool that lets you fine-tune the model on your own footage to match a specific style or subject).

Try it yourself

git clone https://github.com/Lightricks/LTX-2.git
cd LTX-2
uv sync --frozen
source .venv/bin/activate

Model weights are available on Hugging Face. If you don't want to install anything, the fal.ai playground lets you try it in your browser.

Who should pay attention

Social media managers and marketers: Native 9:16 vertical video with audio means you can generate Reels, TikToks, and Shorts with sound — from a text description. No stock footage license, no royalty fees.

Indie filmmakers and content creators: The local inference capability means unlimited iterations at zero marginal cost. Users report an 80% reduction in production time compared to traditional workflows.

Anyone using AI video today: If you've been generating silent clips and manually adding audio, this removes that entire step. The synchronized output is noticeably more cohesive than the stitch-it-together approach.

LTX 2.3 video generation example output

The bigger picture: AI video just became a local-first medium

LTX-2.3 represents a shift in who can access professional video generation. When the model runs on your machine, your footage never leaves your computer — critical for anyone working with client content or sensitive material. And there's no per-minute billing eating into your budget.

Lightricks — the company behind the popular Facetune app — released LTX-2 in January 2026 and has been iterating rapidly. Version 2.3, released in March, adds sharper visuals, native portrait mode, and significantly improved audio quality with a new vocoder that removes silence gaps and noise artifacts.

Related ContentGet Started with Easy Claude Code | Free Learning Guides | More AI News

Stay updated on AI news

Simple explanations of the latest AI developments