The AI Agent Security Crisis Nobody's Talking About: How Anthropic, Google, and Microsoft Paid Bug Bounties Then Stayed Silent

The AI Agent Security Crisis Nobody's Talking About: How Anthropic, Google, and Microsoft Paid Bug Bounties Then Stayed Silent

Date: April 15, 2026

Read Time: 8 minutes

--

How Traditional Prompt Injection Works

Most people familiar with AI security know about indirect prompt injection. The attack works like this: an attacker plants malicious instructions in content that an AI will later process—a webpage, a document, an email. When a victim asks the AI to summarize or analyze that content, the hidden instructions execute, potentially causing the AI to take unwanted actions or reveal sensitive information.

It's a serious vulnerability, but it has a limitation: it's reactive. The attacker plants the payload and waits. The victim has to actively ask the AI to process the poisoned content.

The Comment-and-Control Evolution

Guan's innovation was recognizing that AI agents in automated workflows don't wait for user requests. They process data automatically. And that creates a fundamentally different attack surface.

Here's how it works in practice:

Guan calls this "comment and control" as a play on "command and control." The entire attack runs inside GitHub's infrastructure. No external C2 server needed. No complex infrastructure. Just a malicious pull request and some patience.

--

Attack 1: Anthropic's Claude Code Security Review

The Target: Claude Code Security Review is Anthropic's GitHub Action that uses Claude to analyze code changes and pull requests for security vulnerabilities.

The Vulnerability: The agent reads GitHub data (PR titles, issue bodies, comments) as part of its task context, then takes actions based on that data. Guan realized he could inject malicious instructions into this data stream.

The Exploit: Guan submitted a pull request with a malicious title instructing Claude to execute the whoami command using the Bash tool and return the results as a "security finding." Claude executed the command and embedded the output in its JSON response, which was then posted as a pull request comment.

Credential Theft: When Anthropic asked if he could steal more sensitive data, Guan demonstrated that the same technique could leak credentials—including Anthropic's API key and GitHub access tokens.

The Timeline:

What's Missing: No CVE. No public security advisory. No notification to users that the tool they've integrated into their CI/CD pipelines can be hijacked to steal secrets.

Attack 2: Google's Gemini CLI Action

The Target: Google's Gemini CLI Action integrates Gemini into GitHub issue workflows for automated code analysis and task execution.

The Vulnerability: Same fundamental issue—the agent processes user-controlled input from GitHub without adequate prompt injection defenses.

The Exploit: The researchers injected malicious instructions in a PR title, then added comments with escalating injections. A key technique was injecting a fake "trusted content section" after the real content, which allowed them to override Gemini's safety instructions.

The Result: The researchers successfully exfiltrated Gemini's API key, posting it as an issue comment.

The Timeline:

What's Missing: Again, no CVE. No public advisory. No user notification.

Attack 3: Microsoft's GitHub Copilot Agent

The Target: GitHub Copilot Agent is Microsoft's autonomous software engineering agent that works in the background on GitHub's infrastructure, creating PRs autonomously when assigned issues.

The Defenses: This attack was more complex because GitHub had implemented multiple security layers:

The Exploit: "I bypassed all of them," Guan stated.

The Copilot attack required a variation: malicious instructions hidden in an HTML comment that GitHub's rendered Markdown makes invisible to humans. The victim, not seeing the hidden payload, assigns the issue to Copilot to fix. The agent processes the hidden instructions and executes the attack.

The Timeline:

What's Missing: No CVE. No advisory. Users continue to use the agent without awareness of the vulnerability.

--

The CVE Gap

CVEs (Common Vulnerabilities and Exposures) aren't just bureaucratic checkboxes. They're the backbone of modern vulnerability management. When a CVE is assigned:

By not assigning CVEs for these vulnerabilities, the vendors have made them effectively invisible to standard security tooling. Organizations using these AI agents have no automated way to know they're running vulnerable code.

The Silent Pinned Versions Problem

Guan highlighted a critical issue: "I know for sure that some of the users are pinned to a vulnerable version. If they don't publish an advisory, those users may never know they are vulnerable—or under attack."

Many organizations pin specific versions of GitHub Actions for stability and reproducibility. Without advisories, these organizations have no trigger to review and update their dependencies. They're sitting ducks, and they don't know it.

The Broader Attack Surface

The vulnerabilities aren't limited to these three specific agents. Guan noted that the attack "probably works on other agents that integrate with GitHub, and GitHub Actions that allow access to tools and secrets, such as Slack bots, Jira agents, email agents, and deployment automation agents."

The attack pattern—automatic processing of user-controlled input by AI agents—is widespread. The silence from the major vendors means other vulnerable tools may never get the scrutiny they need.

--

At the heart of this issue is a fundamental design mistake that's being repeated across the AI agent ecosystem: treating user-controlled input as trusted content.

When an AI agent automatically processes pull request titles, issue bodies, and comments, it's treating data that anyone can modify as trusted instructions. This violates basic security principles:

The vendors' response—"only use this with trusted PRs"—is inadequate. In open-source projects and many enterprise environments, "trusted PRs" isn't a practical security boundary. External contributors, contractors, and even compromised legitimate accounts can submit PRs.

--

The bug bounty payments—$100 from Anthropic, $1,337 from Google, $500 from Microsoft—have drawn criticism as inadequate for vulnerabilities of this severity. But the real issue isn't the payment amounts. It's what happened after the payments.

Responsible disclosure doesn't end when the bounty is paid. It ends when users are protected. That requires:

By stopping at step 1 (partial fixes at best), the vendors left their users exposed and unaware.

--

Immediate Actions

If you use AI agents in GitHub Actions:

For security teams:

Longer-Term Strategy

Treat AI agents as privileged employees:

Guan's recommendation is spot-on: "Treat agents as a super-powerful employee. Only give them the tools that they need to complete their task."

This means:

Demand transparency:

When evaluating AI agent tools, ask vendors:

Vendors who can't answer these questions adequately should be treated as higher-risk.

--

The Security-Feature Tradeoff

AI agents are being deployed faster than security practices can evolve. The competitive pressure to ship capabilities is outpacing the work needed to secure them properly. This isn't unique to AI—it's a pattern we've seen with every major technology shift—but the stakes are particularly high with agents that have direct access to code, credentials, and infrastructure.

The Need for AI Agent Security Standards

Current security frameworks weren't designed for AI agents. We need:

The Research Community's Role

Guan and the Johns Hopkins team's work highlights the critical role of independent security research. But it also shows the limitations: researchers can find and disclose vulnerabilities, but they can't force vendors to properly address them or notify users.

The research community may need to evolve its practices—perhaps maintaining public registries of disclosed AI agent vulnerabilities even when vendors don't assign CVEs, or developing industry pressure campaigns for responsible disclosure.

--

--