AI for Automation
Back to AI News
2026-03-27LTX-Videovideo generationopen source4K AI videoLightricks

LTX-Video 2.3 drops: free 4K video with synced audio

LTX-Video 2.3 generates 4K footage at 50 FPS with synchronized audio from a text prompt. MIT licensed, runs on a 16GB GPU, no subscription needed.


Lightricks just released LTX-Video 2.3, an open-source AI video generation model that produces 4K resolution video at up to 50 frames per second — with a synchronized audio track included. It's the first open-source model to combine high-resolution video and audio generation in a single pipeline, MIT licensed, and it runs on a consumer GPU.

To understand why this matters: most AI video tools until now were cloud-only, required $20-35/month subscriptions, produced only 1080p or lower without audio, and put watermarks on free-tier outputs. LTX-Video 2.3 changes all of that in one release.

What LTX-Video 2.3 Generates

From a single text description, LTX-Video 2.3 produces video at these specs:

  • Maximum resolution: 3840×2160 (4K UHD — the standard for modern TVs and professional cameras)
  • Frame rate: up to 50 FPS (frames per second — standard cinema is 24fps; 50fps is broadcast TV quality)
  • Audio: synchronized stereo audio track generated alongside the video
  • Clip length: up to 30 seconds per generation
  • License: MIT (completely free for commercial use — you can use it in paid products, ads, YouTube monetized content, anything)
  • VRAM requirement: 16GB for 1080p output, 24GB for full 4K
LTX-Video 2.3 sample 4K frame showing cinematic quality with audio waveform visualization

The audio generation is the breakthrough here. Previous open-source video models — Wan 2.1, CogVideoX, Mochi — produced silent video only. Adding audio required a separate tool, manual timing, and imperfect synchronization. LTX-Video 2.3 trains both modalities together, so the sound matches exactly what appears on screen.

The Synchronized Audio Breakthrough

LTX-Video 2.3 uses a joint diffusion architecture (a generation approach where video frames and audio samples are produced simultaneously, each influencing the other during generation rather than one added after). The model was trained on a large dataset of paired video-audio content with temporal alignment labels.

In benchmark testing, LTX-Video 2.3 achieves 94.3% audio-visual synchronization accuracy — meaning the audio lines up correctly with on-screen action 94.3% of the time. Separately-added audio from tools like AudioLDM or Stable Audio scores around 71% on the same metric. The difference is perceptible on fast-moving content: footsteps, door slams, musical instrument attacks.

LTX-Video 2.3 architecture diagram showing simultaneous video and audio generation pipeline

How to Run It Locally

LTX-Video 2.3 requires an NVIDIA GPU with 16GB+ VRAM (video memory — the RTX 3090, 4090, or A100 all qualify). Setup takes under 10 minutes:

# Install dependencies
pip install torch diffusers transformers accelerate

# Download and run via Python
from diffusers import LTXVideoPipeline
import torch

pipe = LTXVideoPipeline.from_pretrained(
    "Lightricks/LTX-Video-2.3",
    torch_dtype=torch.bfloat16
).to("cuda")

result = pipe(
    prompt="Sunset over the ocean, waves crashing on a rocky shore, cinematic 4K",
    num_frames=150,   # 3 seconds at 50fps
    height=2160,
    width=3840,
    generate_audio=True
)

result.frames[0].save("output.mp4")

For users without a powerful local machine, LTX-Video 2.3 runs on Google Colab (a free cloud computing environment offering temporary GPU access) with the free T4 GPU — though at 1080p maximum on T4's 16GB of VRAM. Full 4K requires a 24GB card.

Comparison with Paid Alternatives

How does LTX-Video 2.3 stack up against commercial products?

  • Sora (OpenAI): max 1080p, watermarked on free tier, $20+/month — shut down March 2026
  • Runway Gen-4: max 1080p, no built-in audio, commercial license, $35/month
  • Kling 2.0: max 1080p, separate audio step, ~$30/month
  • Pika 2.0: max 1080p, limited audio, $8-28/month
  • LTX-Video 2.3: 4K, synchronized audio, MIT license, $0/month if you have the GPU

The quality gap between open-source and commercial AI video has effectively closed with this release. The remaining advantage commercial tools offer is convenience — a web interface, no GPU required, managed cloud infrastructure. For technical users with a capable GPU, LTX-Video 2.3 is the obvious first choice.

Use Cases Now Affordable

What's newly practical with LTX-Video 2.3:

  • YouTube and social content: B-roll footage with matching ambient audio, monetizable under MIT license
  • Game development: prototype cutscenes and UI animations at 50fps
  • Commercial advertising: product demo videos — MIT license explicitly permits commercial use
  • Film pre-visualization: storyboard sequences with realistic motion and sound
  • Training data: generate labeled video-audio pairs for downstream model training

Model weights, code, and documentation are available at Hugging Face — Lightricks/LTX-Video-2.3 and GitHub — Lightricks/LTX-Video.

Source: Digital Applied — LTX-2.3 Open-Source AI Video with Synchronized Audio

Stay updated on AI news

Simple explanations of the latest AI developments