What is this article about?

Security researcher Aonan Guan hijacked AI agents from three tech giants using novel 'comment and control' prompt injection attacks. The companies paid bug bounties—but stayed silent, leaving users unknowingly vulnerable.

Why does this matter?

This development is significant for the AI industry and could impact how businesses and developers interact with artificial intelligence.

The AI Agent Security Crisis Nobody's Talking About: How Anthropic, Google, and Microsoft Paid Bug Bounties Then Stayed Silent

Date: April 15, 2026

Read Time: 8 minutes

Executive Summary

Security researcher Aonan Guan just pulled off something remarkable: he successfully hijacked AI agents from three of the world's largest tech companies—Anthropic, Google, and Microsoft—using a novel attack technique that allowed him to steal API keys, GitHub tokens, and other sensitive credentials. The companies paid bug bounties. Then they stayed silent.

No CVEs. No public advisories. No warnings to users that the AI agents they rely on for code review and automation are vulnerable to prompt injection attacks that can exfiltrate secrets.

This is the "comment and control" prompt injection attack, and it's exposing a fundamental blind spot in how we're deploying AI agents at scale. Here's what happened, why it matters, and what you need to do about it.

The Attack That Should Have Been Impossible

By April 2026, we've been talking about AI security for years. The major model providers have built guardrails, safety systems, and prompt injection defenses. Anthropic's Claude, Google's Gemini, and Microsoft's GitHub Copilot all have teams dedicated to security. And yet...

Security researcher Aonan Guan, working with researchers from Johns Hopkins University, found that he could hijack all three companies' GitHub-integrated AI agents using variations of the same attack: inject malicious instructions into the data these agents process, then watch as they execute commands and leak credentials.

This wasn't supposed to be possible anymore. But it was.

Understanding "Comment and Control" Prompt Injection

How Traditional Prompt Injection Works

Most people familiar with AI security know about indirect prompt injection. The attack works like this: an attacker plants malicious instructions in content that an AI will later process—a webpage, a document, an email. When a victim asks the AI to summarize or analyze that content, the hidden instructions execute, potentially causing the AI to take unwanted actions or reveal sensitive information.

It's a serious vulnerability, but it has a limitation: it's reactive. The attacker plants the payload and waits. The victim has to actively ask the AI to process the poisoned content.

The Comment-and-Control Evolution

Guan's innovation was recognizing that AI agents in automated workflows don't wait for user requests. They process data automatically. And that creates a different attack surface.

Here's how it works in practice:

The Cleanup: The attacker can edit the PR title back to something innocuous like "fix typo," close the PR, and delete the bot's message—leaving minimal evidence.

Guan calls this "comment and control" as a play on "command and control." The entire attack runs inside GitHub's infrastructure. No external C2 server needed. No complex infrastructure. Just a malicious pull request and some patience.

The Three Attacks: A Technical Breakdown

Attack 1: Anthropic's Claude Code Security Review

The Target: Claude Code Security Review is Anthropic's GitHub Action that uses Claude to analyze code changes and pull requests for security vulnerabilities.

The Vulnerability: The agent reads GitHub data (PR titles, issue bodies, comments) as part of its task context, then takes actions based on that data. Guan realized he could inject malicious instructions into this data stream.

The Exploit: Guan submitted a pull request with a malicious title instructing Claude to execute the whoami command using the Bash tool and return the results as a "security finding." Claude executed the command and embedded the output in its JSON response, which was then posted as a pull request comment.

Credential Theft: When Anthropic asked if he could steal more sensitive data, Guan demonstrated that the same technique could leak credentials—including Anthropic's API key and GitHub access tokens.

The Timeline:

Response: Documentation update warning that "This action is not hardened against prompt injection attacks and should only be used to review trusted PRs"

What's Missing: No CVE. No public security advisory. No notification to users that the tool they've integrated into their CI/CD pipelines can be hijacked to steal secrets.

Attack 2: Google's Gemini CLI Action

The Target: Google's Gemini CLI Action integrates Gemini into GitHub issue workflows for automated code analysis and task execution.

The Vulnerability: Same fundamental issue—the agent processes user-controlled input from GitHub without adequate prompt injection defenses.

The Exploit: The researchers injected malicious instructions in a PR title, then added comments with escalating injections. A key technique was injecting a fake "trusted content section" after the real content, which allowed them to override Gemini's safety instructions.

The Result: The researchers successfully exfiltrated Gemini's API key, posting it as an issue comment.

The Timeline:

Credited: Guan, Neil Fendley, Zhengyu Liu, Senapati Diwangkara, and Yinzhi Cao

What's Missing: Again, no CVE. No public advisory. No user notification.

Attack 3: Microsoft's GitHub Copilot Agent

The Target: GitHub Copilot Agent is Microsoft's autonomous software engineering agent that works in the background on GitHub's infrastructure, creating PRs autonomously when assigned issues.

The Defenses: This attack was more complex because GitHub had implemented multiple security layers:

Runtime-level security: environment filtering, secret scanning, and a network firewall

The Exploit: "I bypassed all of them," Guan stated.

The Copilot attack required a variation: malicious instructions hidden in an HTML comment that GitHub's rendered Markdown makes invisible to humans. The victim, not seeing the hidden payload, assigns the issue to Copilot to fix. The agent processes the hidden instructions and executes the attack.

The Timeline:

March 2026: Paid $500 bounty

What's Missing: No CVE. No advisory. Users continue to use the agent without awareness of the vulnerability.

Why This Disclosure Pattern Is Dangerous

The CVE Gap

CVEs (Common Vulnerabilities and Exposures) aren't just bureaucratic checkboxes. They're the backbone of modern vulnerability management. When a CVE is assigned:

Researchers can track vulnerability trends

By not assigning CVEs for these vulnerabilities, the vendors have made them effectively invisible to standard security tooling. Organizations using these AI agents have no automated way to know they're running vulnerable code.

The Silent Pinned Versions Problem

Guan highlighted a critical issue: "I know for sure that some of the users are pinned to a vulnerable version. If they don't publish an advisory, those users may never know they are vulnerable—or under attack."

Many organizations pin specific versions of GitHub Actions for stability and reproducibility. Without advisories, these organizations have no trigger to review and update their dependencies. They're sitting ducks, and they don't know it.

The Broader Attack Surface

The vulnerabilities aren't limited to these three specific agents. Guan noted that the attack "probably works on other agents that integrate with GitHub, and GitHub Actions that allow access to tools and secrets, such as Slack bots, Jira agents, email agents, and deployment automation agents."

The attack pattern—automatic processing of user-controlled input by AI agents—is widespread. The silence from the major vendors means other vulnerable tools may never get the scrutiny they need.

The Fundamental Problem: Trusted Input vs. User-Controlled Input

At the heart of this issue is a fundamental design mistake that's being repeated across the AI agent ecosystem: treating user-controlled input as trusted content.

When an AI agent automatically processes pull request titles, issue bodies, and comments, it's treating data that anyone can modify as trusted instructions. This violates basic security principles:

Defense in depth: Single layers of protection that can be bypassed with clever prompt engineering

The vendors' response—"only use this with trusted PRs"—is inadequate. In open-source projects and many enterprise environments, "trusted PRs" isn't a practical security boundary. External contributors, contractors, and even compromised legitimate accounts can submit PRs.

The Economic Reality: Bug Bounties vs. Responsible Disclosure

The bug bounty payments—$100 from Anthropic, $1,337 from Google, $500 from Microsoft—have drawn criticism as inadequate for vulnerabilities of this severity. But the real issue isn't the payment amounts. It's what happened after the payments.

Responsible disclosure doesn't end when the bounty is paid. It ends when users are protected. That requires:

Assigning CVEs for tracking

By stopping at step 1 (partial fixes at best), the vendors left their users exposed and unaware.

What This Means for Your Organization

Immediate Actions

If you use AI agents in GitHub Actions:

Monitor for suspicious activity: Watch for PRs that trigger AI agent actions followed by title changes or deletions

For security teams:

Establish policies: Define which AI agents can be used, under what conditions, with what access

Longer-Term Strategy

Treat AI agents as privileged employees:

Guan's recommendation is spot-on: "Treat agents as a super-powerful employee. Only give them the tools that they need to complete their task."

This means:

Logging and monitoring of all agent actions

Demand transparency:

When evaluating AI agent tools, ask vendors:

How do you isolate AI agents from user-controlled input?

Vendors who can't answer these questions adequately should be treated as higher-risk.

The Industry-Wide Implications

The Security-Feature Tradeoff

AI agents are deploying faster than security practices can evolve. The competitive pressure to ship capabilities is outpacing the work needed to secure them properly. This isn't unique to AI—it's a pattern we've seen with every major technology shift—but the stakes are particularly high with agents that have direct access to code, credentials, and infrastructure.

The Need for AI Agent Security Standards

Current security frameworks weren't designed for AI agents. We need:

Security audit requirements for AI agents with privileged access

The Research Community's Role

Guan and the Johns Hopkins team's work highlights the critical role of independent security research. But it also shows the limitations: researchers can find and disclose vulnerabilities, but they can't force vendors to properly address them or notify users.

The research community may need to evolve its practices—perhaps maintaining public registries of disclosed AI agent vulnerabilities even when vendors don't assign CVEs, or developing industry pressure campaigns for responsible disclosure.

Conclusion: The AI Agent Security Gap

The "comment and control" prompt injection attacks against Anthropic, Google, and Microsoft's AI agents reveal a troubling pattern: we're deploying powerful autonomous systems with access to sensitive credentials and infrastructure, but we're not treating their security with the seriousness it demands.

The vulnerabilities themselves are serious but addressable. More concerning is the response pattern: bug bounties paid, but no CVEs, no advisories, no user notifications. This leaves organizations unknowingly vulnerable and undermines the entire security ecosystem.

As AI agents become more capable and more deeply embedded in our development workflows and infrastructure, this security gap will only become more dangerous. The vendors who build these systems have a responsibility to secure them properly and disclose vulnerabilities transparently. The users who deploy these systems have a responsibility to understand the risks and implement appropriate controls.

Neither side has been living up to that responsibility. It's time for that to change.

Key Takeaways

Industry action needed: Standardized AI agent security requirements, CVE-equivalent tracking, and vendor accountability for responsible disclosure

For the full technical details, see Aonan Guan's research publication at oddguan.com. If you use AI agents in your workflows, audit your exposure immediately—don't wait for vendor advisories that may never come.

What's Still Hard

Trust gaps. Organizations worry about AI making decisions with financial or legal consequences. Most deployments include human checkpoints for high-stakes actions.

Integration complexity. Legacy systems don't always play nice with new tools. Many enterprises need middleware that adds cost and fragility.

The learning curve. Teams need time to understand what the system can and can't do. Early missteps create resistance.

The AI Agent Security Crisis Nobody's Talking About: How Anthropic, Google, and Microsoft Paid Bug Bounties Then Stayed Silent

The AI Agent Security Crisis Nobody's Talking About: How Anthropic, Google, and Microsoft Paid Bug Bounties Then Stayed Silent

Executive Summary

The Attack That Should Have Been Impossible

Understanding "Comment and Control" Prompt Injection

How Traditional Prompt Injection Works

The Comment-and-Control Evolution

The Three Attacks: A Technical Breakdown

Attack 1: Anthropic's Claude Code Security Review

Attack 2: Google's Gemini CLI Action

Attack 3: Microsoft's GitHub Copilot Agent

Why This Disclosure Pattern Is Dangerous

The CVE Gap

The Silent Pinned Versions Problem

The Broader Attack Surface

The Fundamental Problem: Trusted Input vs. User-Controlled Input

The Economic Reality: Bug Bounties vs. Responsible Disclosure

What This Means for Your Organization

Immediate Actions

Longer-Term Strategy

The Industry-Wide Implications

The Security-Feature Tradeoff

The Need for AI Agent Security Standards

The Research Community's Role

Conclusion: The AI Agent Security Gap

Key Takeaways

What's Still Hard

Daily AI Intelligence, Free

Frequently Asked Questions

What is "The AI Agent Security Crisis Nobody's Talking About: How Anthropic, Google, and Microsoft Paid Bug Bounties Then Stayed Silent" about?

When was this reported?

Why does this matter?

The AI Agent Security Crisis Nobody's Talking About: How Anthropic, Google, and Microsoft Paid Bug Bounties Then Stayed Silent

Executive Summary

The Attack That Should Have Been Impossible

Understanding "Comment and Control" Prompt Injection

How Traditional Prompt Injection Works

The Comment-and-Control Evolution

The Three Attacks: A Technical Breakdown

Attack 1: Anthropic's Claude Code Security Review

Attack 2: Google's Gemini CLI Action

Attack 3: Microsoft's GitHub Copilot Agent

Why This Disclosure Pattern Is Dangerous

The CVE Gap

The Silent Pinned Versions Problem

The Broader Attack Surface

The Fundamental Problem: Trusted Input vs. User-Controlled Input

The Economic Reality: Bug Bounties vs. Responsible Disclosure

What This Means for Your Organization

Immediate Actions

Longer-Term Strategy

The Industry-Wide Implications

The Security-Feature Tradeoff

The Need for AI Agent Security Standards

The Research Community's Role

Conclusion: The AI Agent Security Gap

Key Takeaways

What's Still Hard

Daily AI Intelligence, Free

Frequently Asked Questions

What is "The AI Agent Security Crisis Nobody's Talking About: How Anthropic, Google, and Microsoft Paid Bug Bounties Then Stayed Silent" about?

When was this reported?

Why does this matter?

Get AI NewsThat Matters

Related Articles

CRITICAL: The AI Framework CVE Cascade Proves No System Is Safe — Here's Why

RED ALERT: One Keypress DESTROYS Your Code — Critical RCE Flaw Found in Claude Code, Gemini CLI, Cursor & Copilot Fuels Next Global Supply Chain Catastrophe

THE INVISIBLE WAR: Claude AI Just Autonomously Hacked a Water Utility's SCADA System — And Your Government Can't Stop It

Get AI News
That Matters