Recently, many of our favorite AI chatbots have gotten autonomous research capabilities. This allows the AI to go off for an extended period of time, while having access to tools, such as web search, integrations, connectors and also custom-built MCP servers.
This post will explore and explain in detail how there can be data spill between connected tools during Deep Research. The research is focused on ChatGPT but applies to other Deep Research agents as well.
With ChatGPT it’s now possible to implement custom ChatGPT Connectors, but they have to follow very specific schema definitions, namely it requires two MCP tools created, search
and fetch
.
When I implemented my first custom Connector, which is technically an MCP server, I got this error when trying to add it to ChatGPT:
The server has to implement exactly search
and fetch
. Anything else, and the Connector will be rejected for violating guidelines.
With terms like search
and fetch
it appears that an agent will only retrieve data, but that doesn’t mean that no data-leakage will occur as we will demonstrate in this post.
This research was conducted about two months ago, so a few things might have changed, but the core principles remain true.
One thing that is probably obvious to some, but maybe less obvious to others, is that all these integration and tools that technically in the same trust boundary.
What does that mean?
It means that data accessible to the agent from one source, can be leaked to other sources when it performs deep research queries.
And if there is an attacker in the loop such leaks can be forced via prompt injection.
The primary goal of this research post is to highlight that there is no trust boundary, and once you add multiple research tools to the research process, the agent can freely invoke tools and may use data from one to query another.
The ChatGPT documentation states:
When you enable a connector, ChatGPT can send and retrieve information from the connected app in order to find information relevant to your prompts and use them in its responses.
This basically hints to this behavior.
To test this out I created a simple remote MCP server named Remote Matrix
that correctly implements search
and fetch
tools to adhere to OpenAI’s specification.
I also published the source code on GitHub. It includes extensive logging, because that was my primary goal to better understand what data is sent by the AI to the MCP server.
The code might also be helpful because I noticed that OpenAI’s demo Connector does not correctly implement the correct schema that OpenAI Deep Research highlights in the docs.
The easiest way to understand this is probably just by looking at this screenshot:
Here you can see we have Outlook Email
, the Remote Matrix
and a few other tools selected. This means that data from Outlook can be used in a query to the Remote Matrix
tool. Remember, a research process often runs for 10+ minutes, so humans are not in the loop during the process.
The documentation highlights this with Web Search:
ChatGPT can also use relevant information accessed from Connectors to inform search queries when ChatGPT searches the web to provide a response.
And turns out that web search uses both Bing and Shopify.
And as you can see above, your stored memories might also be sent to tools.
For the demo I used this research prompt, and connected the Linear and Remote Matrix tools as sources:
Compile a list of all currently open issues and give and
update in a table form with high level details and status.
The research agent first proposes a plan and asks a few follow-up questions to clarify the task. In this case it had asked me if it should search both of the connected sources.
To increase the likelihood of data leakage happening I created the following scenarios to hijack the AI Agent and create a confused deputy with an indirect prompt injection. There are a couple of places I tried:
2.
Remote Matrix
tool with malicious tool description
In the following demo screenshot you can see that ChatGPT started sending queries to the Remote Matrix
MCP server in the voice of a caveman, which was part of a prompt injection in the tool description. 😊
However, more importantly, ChatGPT sent sensitive information from the Linear tickets to the Remote Matrix
MCP server as well:
As a user you can see the individual steps the agent takes in the activity pane:
What was interesting to observe was that the agent actually combined the information of the prod service and credentials from multiple tickets into a single query.
Furthermore, there was an older prompt injection test I had in a Linear ticket that I had used when testing Devin, and that ticket also influenced ChatGPT’s research process and it seemed to start looking for a Slack tool to post a message!
This is how accidents happen.
The same concepts apply to other research agents with the same design patterns, and connection to tools.
OpenAI actually seems to have considered threats around custom Connectors and requires a specific implementation pattern (e.g. with search
and fetch
tools implemented). That means the agent does not get exposed to tools that directly manipulate or delete data.
search
and fetch
tools implemented.This post demonstrates that Deep Research integrations, including ChatGPT Connectors, operate within a shared trust boundary of the agent. This means that data from one source can be sent to other sources by the agent. An attacker can use prompt injection to manipulate the AI during the research process to exfiltrate data across tools.
This highlights the risks with taking the human out of the loop, which appears to be increasingly more common.
As user carefully connect only trusted data sources and be aware that data might leak from one to the other, and that an adversary could influence this with an indirect prompt injection attack.
The primary purpose of this post was just to demonstrate to myself that leakage between data sources can occur. I think there is a lot more research needed in this space to figure out how an attacker can influence autonomous research agents.
One of the Linear tickets with the information that was leaked: