Google just cut Gemini video costs — plus music and data AI
Google launched Veo 3.1 Lite (cheapest AI video), Lyria 3 music gen, Gemma 4, and 5-type data embeddings — all in one week.
Between March 25 and April 2, 2026, Google shipped five distinct AI tools through its Gemini API — and announced most of them quietly in a changelog. No splashy keynote. No product event. Just a rapid succession of launches targeting one shared objective: make AI cheaper to build with at scale.
Cheaper video generation. Cheaper inference. Cheaper data search. And a music generator that most developers haven't noticed yet.
Five tools, eight days
The release timeline is striking for its compression:
- March 25: Lyria 3 — AI music generation from text prompts and images
- March 26: Gemini Flash-Live Preview — real-time audio conversation (voice dialogue with the model, streaming back and forth live)
- March 31: Veo 3.1 Lite Preview — Google's cheapest AI video generation model
- April 1: Flex and Priority inference tiers (inference means running an AI model to get a response) — choose cost or speed
- April 2: Gemma 4 — two open-weight language models at 26B and 31B (shorthand for 26 billion and 31 billion parameters, the scale of learned knowledge in the model)
Nearly every launch is framed around cost efficiency. That's not coincidental. Google is responding to mounting pressure from open-source alternatives, cheaper competitors, and developer frustration with enterprise AI pricing. They're cutting the price of building with Gemini across every major media type simultaneously.
Veo 3.1 Lite — built for high-volume video
Google officially describes Veo 3.1 Lite Preview as "the most cost-efficient video generation model, designed for rapid iteration and building high-volume applications." That's a clear positioning statement — this isn't the quality tier, it's the volume tier.
The Veo family now spans three variants: Veo 3.1, Veo 3.1 Fast, and the new Veo 3.1 Lite. Think of it like a printing service menu: premium output for the important deliverables, draft mode for the rest. Lite is the draft mode — and draft mode is often what ships at scale.
The clearest use cases:
- Product demo pipelines — generating hundreds of product walkthrough clips with slight variations for A/B testing across markets
- Social content automation — brands producing 50+ videos per week for different regional audiences
- Training data generation — AI companies building video datasets that need volume, not cinema quality
- Rapid prototyping — designers testing video concepts before committing to expensive final renders
One caveat worth flagging: Google has not published a side-by-side pricing comparison between Veo 3.1 Lite and the standard Veo 3.1. "Most cost-efficient" is a marketing claim until the actual rate differential is confirmed. Expect the specifics to appear in the AI Studio billing dashboard as the preview matures.
Lyria 3 — AI music from a sentence
This is the launch getting the least attention relative to its actual impact. Lyria 3 generates music — real, structured, full audio tracks — in 48kHz stereo quality (kHz means thousands of audio samples per second; 48kHz captures richer sound detail than the 44.1kHz standard used by CDs and most streaming services) from two input types: plain text descriptions, or images.
Two models for two different needs
Google released Lyria 3 in two distinct forms:
- Lyria 3 (clips): 30-second music clips — useful for app sound design, podcast intros, social media reels, and notification sounds
- Lyria 3 (songs): Full-length tracks — closer to a production tool than a demo generator, relevant for content creators and app developers who previously needed licensed music libraries
The image-to-music capability is particularly notable. Give it a landscape photograph and Lyria 3 generates an ambient score matching the visual mood. Give it a product image and it produces brand-appropriate background music. This replaces a workflow that previously required either a human composer, a licensing deal with a stock music library, or a separate subscription to Suno or Udio.
For developers: Lyria 3 lives inside the same Gemini API ecosystem as video, text, and image tools. One authentication setup, one billing account, one code library. That integration advantage is real — managing separate APIs for music, video, and text generation adds engineering overhead that Google is eliminating by consolidating everything under Gemini.
Flex and Priority tiers — the pricing dial developers wanted
Launched April 1, the new inference tiers give developers an explicit lever to control the cost-versus-speed tradeoff:
- Flex tier: Lower cost, accepts queued or slower delivery — optimized for batch workloads where you submit work and retrieve results later, not while a user waits on screen
- Priority tier: Fastest available processing, higher cost — for real-time, user-facing applications where response latency is visible
Before this change, every API call was priced and processed the same way. That's fine for a chatbot. It's wasteful for a company running 10,000 nightly document summaries — tasks with no human waiting on the result. Flex tier lets those batch workloads run at off-peak efficiency, similar to how cloud providers like AWS and Google Cloud already price compute: pay for urgency, not just capacity.
Combined with the March 12 introduction of project-level spend caps and the March 23 rollout of both Prepay and Postpay billing options, Google has added more financial control infrastructure in 30 days than in the prior year. Teams with unpredictable AI usage patterns now have meaningful guardrails.
Five data types in one search model
Launched March 10, gemini-embedding-2-preview deserves attention alongside the newer releases. It's the first embedding model (a tool that converts raw content into numbers AI can search and compare — like a universal translator for media types) from Google that accepts five distinct input types in a single call:
- Text documents
- Images
- Video files
- Audio recordings
- PDF files
The practical impact: building a search tool that can query across documents, photos, audio clips, and videos using the same query — without separate processing pipelines for each media type. What was previously a 3-pipeline engineering problem (text search + image search + audio-and-video transcription-then-search) compresses to a single call.
File size limits also expanded: inputs can now reach 100MB, up from the previous 20MB cap — a 5× increase that accommodates longer video files and larger PDF documents without requiring pre-processing to trim file size.
// Multimodal embedding — search across 5 data types in one call
const response = await geminiClient.embedContent({
model: "gemini-embedding-2-preview",
content: {
parts: [
{ text: "Q4 earnings call transcript" },
{ inlineData: { mimeType: "application/pdf", data: pdfBase64 } }
]
}
});
// Returns: numerical vectors for semantic search
// (semantic search means finding content by meaning, not just keyword match)
Complementing this, Google also added Built-in Tools and Function Calling support in the same single API call on March 18 — previously they had to be used separately. And Grounding with Google Maps is now available across all Gemini 3 models, enabling location-aware responses useful for travel apps, logistics platforms, or any tool where geography improves accuracy.
The migration tax no one advertises
Every launch announcement in the changelog comes paired with a deprecation notice. The model shutdown schedule is aggressive:
- March 31, 2026: First wave of Gemini 1.5 and earlier model variants already offline
- June 1, 2026: Multiple Gemini 2.x models shut down — weeks from now
- Mid-to-late 2026: Further deprecations across the remaining Gemini 2.x family
More than 15 models deprecated or scheduled for shutdown in roughly 6 months. That pace is high by cloud platform standards — AWS typically maintains deprecated services for 12–18 months. Google appears confident enough in the Gemini 3.x generation to sunset predecessors quickly, but teams running stable production applications on older models absorb real engineering cost to comply. Migration means code changes, regression testing, and potential output quality differences that require prompt re-tuning.
If your application calls any Gemini model with a 1.x or 2.x version string, the June 1 deadline is now weeks away. Check the official changelog for your specific model's cutoff date.
Where to try everything right now
All tools are accessible through Google AI Studio and the Gemini API. Lyria 3 and Veo 3.1 Lite are currently in preview, with potential rate limits depending on region and account tier. The Computer Use tool — which lets an AI control a browser or desktop interface on your behalf — is live in both gemini-3-pro-preview and gemini-3-flash-preview.
The clearest entry point for non-developers: AI Studio's free sandbox lets you test Lyria 3 music generation and Veo 3.1 Lite video generation through a browser interface — no code required. Type a text prompt and generate a 30-second music clip or a short video clip directly in the browser. For teams evaluating the platform, that's a lower barrier than any competitor offers for the same combination of media types in one place.
For developers already on Gemini 2.x: the migration to Gemini 3.x is not optional much longer. The question is whether to migrate on your own schedule before June — or on Google's schedule after.
Related Content — Get Started | Guides | More News
Stay updated on AI news
Simple explanations of the latest AI developments