2026-04-04Gemma 4Google AI Studioopen source AI modelsGemini APIVeo 3.1 LiteAI automationfree AI modelsLyria 3

Google Gemma 4: Free 31B Open Model on AI Studio

Google launches Gemma 4 — free 26B and 31B open models on AI Studio. Veo 3.1 Lite cuts video costs. Try both now via the Gemini API.

Google's Gemma 4 — two open-weight AI models with 26B and 31B parameters — launched April 2, 2026, and are free to test today on AI Studio. Developers can access both through the Gemini API with no new account setup, making this one of the most accessible open-source AI releases of 2026. The launch also brings Veo 3.1 Lite for budget video generation and Lyria 3 for AI music — marking Google's clearest push yet toward tiered AI automation for every budget.

The release landed alongside Veo 3.1 Lite (a budget-friendly video generation model) and Lyria 3 (a music generator that produces CD-quality audio from a text description). Together, they mark Google's clearest shift yet toward tiered AI for every budget — but with an aggressive deprecation (scheduled shutdown) cycle that is quietly creating headaches for developers who build on Google's stack.

Gemma 4 open-weight AI models from Google DeepMind — 26B and 31B parameters available free on AI Studio

Gemma 4 Open Models — Two Sizes for AI Automation Workflows

Gemma 4 ships as two distinct models, each tuned for a different type of workload:

gemma-4-26b-a4b-it — 26 billion parameters, compressed via 4-bit quantization (a technique that shrinks model size in memory without dramatically hurting quality). Best for speed-sensitive, cost-sensitive applications.
gemma-4-31b-it — 31 billion parameters, full precision. Better for complex reasoning, nuanced instruction-following (understanding and acting on specific user commands), and detailed writing tasks.

The "it" in both names stands for "instruction-tuned" — meaning these models were trained specifically to follow user commands rather than just predict the next word. An instruction-tuned model is the difference between an AI that rambles and one that actually does what you ask.

For scale: GPT-4-class frontier models (the top tier of commercial AI systems) typically run hundreds of billions of parameters. At 31B, Gemma 4 is powerful enough for most business tasks — document summarization, customer support drafts, code review — at a fraction of the compute cost. Crucially, both are open-weight, meaning Google releases the actual model files for you to run on your own servers, so your data never leaves your infrastructure.

# Quick start: Gemma 4 via the Gemini API (Python)
import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemma-4-31b-it")

response = model.generate_content("Summarize this report in 3 bullet points: ...")
print(response.text)

Both models are available now on AI Studio — free to try without a credit card. To get Gemma 4 running in your own environment, see our quick-start setup guide.

Veo 3.1 Lite — AI Video Generation Without the Price Tag

Video generation is notoriously expensive. A single 5-second AI clip can cost more than generating 50 pages of text. Veo 3.1 Lite (released March 31) is Google's answer for teams that need volume over perfection — described officially as the "most cost-efficient video generation model, designed for rapid iteration."

The intended workflow: use Veo 3.1 Lite to cheaply draft 10–20 variations of a concept, then run only the winning version through full Veo 3 for production quality. This draft-then-polish approach can dramatically cut video AI costs for marketing teams, content agencies, and app developers who generate clips at scale.

Veo 3.1 Lite AI video generation examples from Google — cost-efficient model for rapid iteration

Veo 3 (the full-quality version, released October 2025) already supports a strong feature set:

Video lengths of 4, 6, or 8 seconds per clip
Up to 3 reference images per generation — anchor the visual style to specific photos
First and last frame control — specify exactly where the clip starts and ends for precise storytelling
Image-to-video — turn a still photo or illustration into a short animated clip

Veo 3.1 Lite inherits these controls at a lower cost tier, making it the practical choice for iteration-heavy workflows where you need to experiment before committing budget to final renders.

Lyria 3 — Full Songs from One Text Prompt

Buried in the same changelog: on March 25, Google launched Lyria 3, its most capable music generation model yet. Two variants are available:

lyria-3-clip-preview — generates 30-second clips for intro music, scene transitions, or social media content
lyria-3-pro-preview — generates full-length songs for podcasts, video productions, or standalone music projects

Lyria 3 accepts both text prompts ("upbeat jazz with a walking bass line, 120 BPM, no vocals") and image inputs — it reads visual mood from a photo or graphic and translates it into matching audio. Output quality is 48kHz stereo, which is professional broadcast-grade sound (standard audio CDs use 44.1kHz; film and video production uses 48kHz as the industry norm).

The practical value: royalty-free, instantly customizable background music for YouTube videos, apps, and presentations — without licensing fees or hiring composers. Content creators who currently pay $20–$80/month for stock music libraries have a new alternative to evaluate.

Gemini API Pricing — What's Getting Cheaper

Google quietly cut image input costs by 5x in this update cycle. Sending an image to the Gemini API (the connection point between your app and Google's AI models) previously consumed 1,290 tokens per image. That cost is now just 258 tokens. Tokens are the "units" AI systems use to measure input and output — approximately ¾ of an English word per token. If your application analyzes images regularly, this single change could reduce your monthly bill by up to 80%.

Other infrastructure and pricing changes from the last 90 days:

File uploads now support up to 100MB (raised from 20MB as of January 8) — large PDFs, audio files, and datasets no longer need splitting
Flex and Priority inference tiers (launched April 1) — Flex is slower and cheaper for background processing; Priority is faster at a premium rate for user-facing apps
Cloud Storage buckets and pre-signed URLs now accepted as input sources — no need to re-upload the same files on each request
Google Search grounding (using live web search results to improve answers) now carries extra charges for Gemini 3 models — verify your billing settings if you use this feature

The new billing structure rewards developers who plan ahead: Flex pricing suits batch jobs and overnight processing, while Priority suits anything a real user is waiting on. Building both into your app from the start gives you a cost lever you can pull without rewriting logic.

Google AI Deprecation Cycle — Why You Need a Migration Plan

Here is the part of Google's AI roadmap that release notes don't emphasize: models are being retired (permanently shut down) faster than most teams track. Each deprecation forces code updates, re-testing, and often prompt re-engineering (rewriting the specific instructions you use to guide a model's behavior).

Confirmed shutdowns as of April 4, 2026:

June 1, 2026 — Gemini 2.0 Flash models permanently shut down (6-month notice issued February 18)
March 31, 2026 — Gemini 2.5 Flash Lite already offline
February 17, 2026 — Earlier Gemini 2.0 versions already gone
Multiple embedding models (tools that convert text into numbers for search and similarity matching) retiring on staggered dates throughout 2026

Google does give 6 months notice for major shutdowns — the June 1 deadline was announced on February 18 — but managing several overlapping deprecations simultaneously requires active tracking. For solo developers or small teams, each migration easily costs 1–2 days of engineering time.

The practical fix: build an abstraction layer (a thin piece of code that sits between your app and whichever AI model you're calling, letting you swap models by changing one config value instead of rewriting your whole codebase). Teams that do this upfront turn every model deprecation into a 15-minute config change rather than a week of emergency refactoring.

Learn how to structure AI automation workflows that survive model changes in our automation guides, or get Gemma 4 running in minutes with our quick-start setup guide.

Related Content — Get Started | Guides | More News

Sources

Google Gemini API Changelog

Stay updated on AI news

Simple explanations of the latest AI developments