Google on Monday rolled out a new AI Vulnerability Reward Program to encourage researchers to find and report flaws in its AI systems, with rewards of up to $30,000 for a single qualifying report.
In addition to a base reward of up to $20,000 for the highest-tier AI product flaw, Google adopted the same report multipliers, influenced by vulnerability reporting quality, as it uses for its traditional security Vulnerability Reward Program (VRP). This could raise the reward for an individual report by an extra $10,000.
The standalone AI bug bounty program comes two years after the Chocolate Factory expanded its VRP to include Google AI products.
In launching the new AI-specific VRP, Google also updated its rules and clarified what types of attacks are considered "in-scope" for the contest, as well as those that aren't. Specifically: direct prompt injection, jailbreaks, and alignment issues don't count.
Yes, tricking models into doing something not normally permitted by its guardrails is an important issue, and Google encourages researchers "to report these content-related issues in-product."
But it's not going to pay a bug bounty for them.
"Simply put, we don't believe a Vulnerability Reward Program is the right format for addressing content-related issues," Google security engineering managers Jason Parsons and Zak Bennett said in a blog post.
Solving these types of issues requires long-term efforts, and analyzing trends across large volumes of reports, and this isn't conducive to Google's "goal of providing timely rewards to individual researchers," the duo wrote.
Additionally, as their fellow Googlers opined in December: "There may, in fact, be an infinite number of possible jailbreaks for any particular model and fully mitigating them may be completely infeasible."
Here are the security flaws considered in-scope for the AI bug bounty program, listed from the most serious (thus garnering the biggest reward for reporting) to the least:
- Rogue actions, which Google describes as "attacks that modify the state of the victim's account or data with a clear security impact." An example of this would be an indirect prompt injection attack - this occurs when a user embeds malicious instructions into a prompt that the model can act upon - causing Google Home to do something - say, unlocking a smart lock.
- Sensitive data exfiltration that leaks victims' PII or other sensitive details without user approval. This could also involve an indirect prompt injection attack in which an AI system summarizes someone's email contents, and then sends that to an attacker-controlled account.
- Phishing enablement: "Persistent, cross-user HTML injection on a Google-branded site which: (a) does not include a 'user-generated content' warning, and (b) at the panel's discretion, presents a convincing phishing attack vector," according to Google. In other words: using a product to share an attacker-generated website that spoofs a legitimate Google site without a user-generated content warning, then distributing that page for phishing attacks.
- Model theft, which allows attackers to exfiltrate complete – and confidential – model parameters.
- Context manipulation (cross-account) attacks that allow for repeatable, persistent, and hidden manipulation of the context of a victim's AI environment, and that don't require much, if any, victim interaction. One example attack scenario, according to Google, would be: "An attacker is able to send a calendar invite to a victim, causing a memory to be stored in an AI product; the product takes unconfirmed, but non-security-sensitive, future actions based on that stored memory."
- Access control bypasses (limited security impact), which allow an attacker to bypass access controls and steal data that is otherwise inaccessible but not security-sensitive, such as Google's campus lunch menus.
- Unauthorized product usage, or enabling Google server-side features on the user's account without the user paying for them, or otherwise being authorized to use them.
- Cross-user denial of service (with caveats), which involves causing a persistent denial of service for an AI product or specific feature in a victim account. The caveats: Volumetric DoS attacks are prohibited, and researchers can't cause a DoS in their current account.
In addition to detailing what types of attacks are in-scope, Google also defined product tiers for the AI VRP scope. These fall into three categories: flagship, standard, and other.
Flagship products include Google Search, Gemini Apps (Web, Android, and iOS), and Google Workspace core applications (Gmail, Drive, Meet, Calendar, Docs, Sheets, Slides, and Forms).
Standard covers AI features in "high-sensitivity" products such as AI Studio, Jules, and Google Workspace non-core applications (NotebookLM, Appsheet, etc).
Other is all other AI integrations in Google products with some exceptions, so be sure to read the complete program rules.
- Bug bounty hunters load up to stalk AI and fancy bagging big bucks
- Poking holes in Google tech bagged bug hunters $10M
- Bug bounties: The good, the bad, and the frankly ridiculous ways to do it
- Google DeepMind minds the patch with AI flaw-fixing scheme
Researchers can earn the highest reward amounts for finding flaws in flagship products. So, for example, a rogue action in one of these can net a bug hunter $20,000. They can nab $15,000 for a standard product, and $10,000 for something in the "other" category.
For comparison: the lowest-category (cross-user denial of service) can earn a researcher $500 (flagship), $100 (standard), or Google credit (other).
Google paid out nearly $12 million to more than 600 researchers last year through its VRP, compared to $10 million in 2023.
And considering the fun researchers and red teamers have been poking holes in AI systems, we'd expect this one to be a bountiful year for them – especially with the new AI-specific focus. Happy hunting out there. ®
.png)


