The AI Agent Security Crisis Nobody's Talking About: How Anthropic, Google, and Microsoft Paid Bug Bounties Then Stayed Silent
Date: April 15, 2026
Read Time: 8 minutes
--
Executive Summary
The Attack That Should Have Been Impossible
Understanding "Comment and Control" Prompt Injection
Security researcher Aonan Guan just pulled off something remarkable: he successfully hijacked AI agents from three of the world's largest tech companies—Anthropic, Google, and Microsoft—using a novel attack technique that allowed him to steal API keys, GitHub tokens, and other sensitive credentials. The companies paid bug bounties. Then they stayed silent.
No CVEs. No public advisories. No warnings to users that the AI agents they rely on for code review and automation are vulnerable to prompt injection attacks that can exfiltrate secrets.
This is the "comment and control" prompt injection attack, and it's exposing a fundamental blind spot in how we're deploying AI agents at scale. Here's what happened, why it matters, and what you need to do about it.
--
By April 2026, we've been talking about AI security for years. The major model providers have built guardrails, safety systems, and prompt injection defenses. Anthropic's Claude, Google's Gemini, and Microsoft's GitHub Copilot all have teams dedicated to security. And yet...
Security researcher Aonan Guan, working with researchers from Johns Hopkins University, found that he could hijack all three companies' GitHub-integrated AI agents using variations of the same attack: inject malicious instructions into the data these agents process, then watch as they execute commands and leak credentials.
This wasn't supposed to be possible anymore. But it was.
--
How Traditional Prompt Injection Works
Most people familiar with AI security know about indirect prompt injection. The attack works like this: an attacker plants malicious instructions in content that an AI will later process—a webpage, a document, an email. When a victim asks the AI to summarize or analyze that content, the hidden instructions execute, potentially causing the AI to take unwanted actions or reveal sensitive information.
It's a serious vulnerability, but it has a limitation: it's reactive. The attacker plants the payload and waits. The victim has to actively ask the AI to process the poisoned content.
The Comment-and-Control Evolution
Guan's innovation was recognizing that AI agents in automated workflows don't wait for user requests. They process data automatically. And that creates a fundamentally different attack surface.
Here's how it works in practice:
- The Cleanup: The attacker can edit the PR title back to something innocuous like "fix typo," close the PR, and delete the bot's message—leaving minimal evidence.
Guan calls this "comment and control" as a play on "command and control." The entire attack runs inside GitHub's infrastructure. No external C2 server needed. No complex infrastructure. Just a malicious pull request and some patience.
--
The Three Attacks: A Technical Breakdown
Attack 1: Anthropic's Claude Code Security Review
The Target: Claude Code Security Review is Anthropic's GitHub Action that uses Claude to analyze code changes and pull requests for security vulnerabilities.
The Vulnerability: The agent reads GitHub data (PR titles, issue bodies, comments) as part of its task context, then takes actions based on that data. Guan realized he could inject malicious instructions into this data stream.
The Exploit: Guan submitted a pull request with a malicious title instructing Claude to execute the whoami command using the Bash tool and return the results as a "security finding." Claude executed the command and embedded the output in its JSON response, which was then posted as a pull request comment.
Credential Theft: When Anthropic asked if he could steal more sensitive data, Guan demonstrated that the same technique could leak credentials—including Anthropic's API key and GitHub access tokens.
The Timeline:
- Response: Documentation update warning that "This action is not hardened against prompt injection attacks and should only be used to review trusted PRs"
What's Missing: No CVE. No public security advisory. No notification to users that the tool they've integrated into their CI/CD pipelines can be hijacked to steal secrets.
Attack 2: Google's Gemini CLI Action
The Target: Google's Gemini CLI Action integrates Gemini into GitHub issue workflows for automated code analysis and task execution.
The Vulnerability: Same fundamental issue—the agent processes user-controlled input from GitHub without adequate prompt injection defenses.
The Exploit: The researchers injected malicious instructions in a PR title, then added comments with escalating injections. A key technique was injecting a fake "trusted content section" after the real content, which allowed them to override Gemini's safety instructions.
The Result: The researchers successfully exfiltrated Gemini's API key, posting it as an issue comment.
The Timeline:
- Credited: Guan, Neil Fendley, Zhengyu Liu, Senapati Diwangkara, and Yinzhi Cao
What's Missing: Again, no CVE. No public advisory. No user notification.
Attack 3: Microsoft's GitHub Copilot Agent
The Target: GitHub Copilot Agent is Microsoft's autonomous software engineering agent that works in the background on GitHub's infrastructure, creating PRs autonomously when assigned issues.
The Defenses: This attack was more complex because GitHub had implemented multiple security layers:
- Runtime-level security: environment filtering, secret scanning, and a network firewall
The Exploit: "I bypassed all of them," Guan stated.
The Copilot attack required a variation: malicious instructions hidden in an HTML comment that GitHub's rendered Markdown makes invisible to humans. The victim, not seeing the hidden payload, assigns the issue to Copilot to fix. The agent processes the hidden instructions and executes the attack.
The Timeline:
- March 2026: Paid $500 bounty
What's Missing: No CVE. No advisory. Users continue to use the agent without awareness of the vulnerability.
--
Why This Disclosure Pattern Is Dangerous
The CVE Gap
CVEs (Common Vulnerabilities and Exposures) aren't just bureaucratic checkboxes. They're the backbone of modern vulnerability management. When a CVE is assigned:
- Researchers can track vulnerability trends
By not assigning CVEs for these vulnerabilities, the vendors have made them effectively invisible to standard security tooling. Organizations using these AI agents have no automated way to know they're running vulnerable code.
The Silent Pinned Versions Problem
Guan highlighted a critical issue: "I know for sure that some of the users are pinned to a vulnerable version. If they don't publish an advisory, those users may never know they are vulnerable—or under attack."
Many organizations pin specific versions of GitHub Actions for stability and reproducibility. Without advisories, these organizations have no trigger to review and update their dependencies. They're sitting ducks, and they don't know it.
The Broader Attack Surface
The vulnerabilities aren't limited to these three specific agents. Guan noted that the attack "probably works on other agents that integrate with GitHub, and GitHub Actions that allow access to tools and secrets, such as Slack bots, Jira agents, email agents, and deployment automation agents."
The attack pattern—automatic processing of user-controlled input by AI agents—is widespread. The silence from the major vendors means other vulnerable tools may never get the scrutiny they need.
--
The Fundamental Problem: Trusted Input vs. User-Controlled Input
At the heart of this issue is a fundamental design mistake that's being repeated across the AI agent ecosystem: treating user-controlled input as trusted content.
When an AI agent automatically processes pull request titles, issue bodies, and comments, it's treating data that anyone can modify as trusted instructions. This violates basic security principles:
- Defense in depth: Single layers of protection that can be bypassed with clever prompt engineering
The vendors' response—"only use this with trusted PRs"—is inadequate. In open-source projects and many enterprise environments, "trusted PRs" isn't a practical security boundary. External contributors, contractors, and even compromised legitimate accounts can submit PRs.
--
The Economic Reality: Bug Bounties vs. Responsible Disclosure
The bug bounty payments—$100 from Anthropic, $1,337 from Google, $500 from Microsoft—have drawn criticism as inadequate for vulnerabilities of this severity. But the real issue isn't the payment amounts. It's what happened after the payments.
Responsible disclosure doesn't end when the bounty is paid. It ends when users are protected. That requires:
- Assigning CVEs for tracking
By stopping at step 1 (partial fixes at best), the vendors left their users exposed and unaware.
--
What This Means for Your Organization
Immediate Actions
If you use AI agents in GitHub Actions:
- Monitor for suspicious activity: Watch for PRs that trigger AI agent actions followed by title changes or deletions
For security teams:
- Establish policies: Define which AI agents can be used, under what conditions, with what access
Longer-Term Strategy
Treat AI agents as privileged employees:
Guan's recommendation is spot-on: "Treat agents as a super-powerful employee. Only give them the tools that they need to complete their task."
This means:
- Logging and monitoring of all agent actions
Demand transparency:
When evaluating AI agent tools, ask vendors:
- How do you isolate AI agents from user-controlled input?
Vendors who can't answer these questions adequately should be treated as higher-risk.
--
The Industry-Wide Implications
The Security-Feature Tradeoff
AI agents are being deployed faster than security practices can evolve. The competitive pressure to ship capabilities is outpacing the work needed to secure them properly. This isn't unique to AI—it's a pattern we've seen with every major technology shift—but the stakes are particularly high with agents that have direct access to code, credentials, and infrastructure.
The Need for AI Agent Security Standards
Current security frameworks weren't designed for AI agents. We need:
- Security audit requirements for AI agents with privileged access
The Research Community's Role
Guan and the Johns Hopkins team's work highlights the critical role of independent security research. But it also shows the limitations: researchers can find and disclose vulnerabilities, but they can't force vendors to properly address them or notify users.
The research community may need to evolve its practices—perhaps maintaining public registries of disclosed AI agent vulnerabilities even when vendors don't assign CVEs, or developing industry pressure campaigns for responsible disclosure.
--
Conclusion: The AI Agent Security Gap
Key Takeaways
The "comment and control" prompt injection attacks against Anthropic, Google, and Microsoft's AI agents reveal a troubling pattern: we're deploying powerful autonomous systems with access to sensitive credentials and infrastructure, but we're not treating their security with the seriousness it demands.
The vulnerabilities themselves are serious but addressable. More concerning is the response pattern: bug bounties paid, but no CVEs, no advisories, no user notifications. This leaves organizations unknowingly vulnerable and undermines the entire security ecosystem.
As AI agents become more capable and more deeply embedded in our development workflows and infrastructure, this security gap will only become more dangerous. The vendors who build these systems have a responsibility to secure them properly and disclose vulnerabilities transparently. The users who deploy these systems have a responsibility to understand the risks and implement appropriate controls.
Neither side has been living up to that responsibility. It's time for that to change.
--
- Industry action needed: Standardized AI agent security requirements, CVE-equivalent tracking, and vendor accountability for responsible disclosure
--
- For the full technical details, see Aonan Guan's research publication at oddguan.com. If you use AI agents in your workflows, audit your exposure immediately—don't wait for vendor advisories that may never come.