2026-03-27OpenAIbug bountyAI securityAI agentsChatGPTcybersecurity

OpenAI will pay $100K if you break its AI agents

OpenAI's new Safety Bug Bounty pays up to $100K for finding ways AI agents can be abused. First major lab to bounty AI behavior, not just code bugs.

On March 26, 2026, OpenAI launched its Safety Bug Bounty — a public program that pays security researchers up to $100,000 to find ways its AI agents can be misused, manipulated, or weaponized. Hosted on Bugcrowd (the platform managing vulnerability programs for major tech companies), it runs alongside OpenAI's existing Security Bug Bounty, which has rewarded 409 confirmed vulnerabilities since April 2023.

This is a different category of program. Traditional bug bounties pay for finding coding errors in software — a buffer overflow, an authentication bypass, an exposed API key. The Safety Bug Bounty pays for finding ways the AI itself can be turned against users — through clever prompting, plugin abuse, or chaining the model's own capabilities into harmful outcomes.

It is the first time a major AI lab has publicly put a cash bounty on AI behavioral safety rather than just code security — acknowledging that as AI agents gain real-world capabilities, the attack surface has fundamentally expanded.

OpenAI Safety Bug Bounty program launches with $100K payout

Exactly What OpenAI Is Paying For

The program covers three scope categories:

1. Agentic Risks

AI agents (automated systems that can browse the web, execute code, send emails, and use external tools on your behalf) can be hijacked or misdirected. Qualifying findings include: third-party prompt injection (hiding secret instructions inside a webpage or document that an agent reads, causing it to take unintended actions), data exfiltration via agents (tricking an agent into emailing your private files to an attacker), and unauthorized actions at scale through ChatGPT or the API. MCP (Model Context Protocol — the system letting agents connect to external tools like calendars, databases, and browsers) abuse is explicitly in scope.

2. Proprietary Information Exposure

Finding ways to extract confidential system prompts (the private instructions OpenAI gives its models behind the scenes), training data, or internal model configurations that the AI should not reveal to users.

3. Account and Platform Integrity

Abusing OpenAI's platform at a systemic level — bypassing rate limits to access services at no cost, accessing other users' conversation histories, or manipulating billing systems.

From $20K to $100K — What Changed

The previous Security Bug Bounty capped critical findings at $20,000. The new Safety Bug Bounty raises that ceiling to $100,000 — a 5× increase. This isn't just a number change; it signals that OpenAI now treats AI-specific abuse risks on par with the most severe traditional security exploits.

To qualify for the highest payouts, agentic risk findings must be reproducible at least 50% of the time. This threshold prevents the program from being flooded with one-off flukes. Jailbreaks that merely produce rude language or return "easily searchable information" are explicitly out of scope — OpenAI wants documented proof of real harm potential, with reproducible steps.

Submissions are triaged jointly by OpenAI's Safety and Security teams and may be rerouted between programs depending on the type and ownership of the finding. The company also runs private red-team campaigns for harm types not suitable for public disclosure.

Why This Is a New Category of Security Research

A traditional cybersecurity exploit abuses a mistake in code that a programmer wrote. An AI safety exploit abuses the model's reasoning — its trained tendencies and emergent behaviors. These are fundamentally harder to find, harder to verify, and harder to patch. You can't just push a code fix; you may need to retrain or fine-tune the model.

The explicit inclusion of MCP abuse is particularly notable. As AI agents increasingly connect to real-world services — your Google Calendar, your company Slack, your code repositories — an attacker who can craft a malicious document that an agent reads and acts upon gains a new kind of leverage. The agent becomes the attack vector.

For developers building on the OpenAI API: the program's public scope documentation is essentially a free security audit of the same attack surfaces your applications are exposed to. Reading through what OpenAI considers highest-priority risks will help you harden your own agent implementations.

How to Submit a Finding

The program is live at bugcrowd.com/engagements/openai-safety. You need a free Bugcrowd account and must follow responsible disclosure guidelines — no public posting until the finding is reviewed and addressed. Testing MCP-related risks must comply with the terms of service of the third-party services involved, limiting scope to OpenAI's own infrastructure.

If you're a security researcher, AI red-teamer, or developer who has noticed unusual model behavior that could be exploited — this is now the official, financially incentivized channel to report it.

Related Content — Get Started with Easy Claude Code | Free Learning Guides | More AI News

Sources

Stay updated on AI news

Simple explanations of the latest AI developments