Recent claims by threat actors that they obtained an OmniGPT backend database show the risks of using sensitive data on AI chatbot platforms, where any data inputs could potentially be revealed to other users or exposed in a breach.
OmniGPT has yet to respond to the claims, which were made by threat actors on the BreachForums leak site, but Cyble dark web researchers analyzed the exposed data.
Cyble researchers detected potentially sensitive and critical data in the files, ranging from personally identifiable information (PII) to financial information, access credentials, tokens and API keys. The researchers did not attempt to validate the credentials but based their analysis on the potential severity of the leak if the TAs’ claims are confirmed to be valid.
OmniGPT integrates several well-known large language models (LLMs) into a single platform, including Google Gemini, ChatGPT, Claude Sonnet, Perplexity, DeepSeek and DALL-E, making it a convenient platform for accessing a range of LLM tools.
The threat actors (TAs), who posted under aliases that included Gloomer and SyntheticEmotions, claimed that the data “contains all messages between the users and the chatbot of this site as well as all links to the files uploaded by users and also 30k user emails. You can find a lot of useful information in the messages such as API keys and credentials and many of the files uploaded to this site are very interesting because sometimes, they contain credentials/billing information.
The data analyzed by Cyble includes four files:
Cyble found that the email and phone number file (UserID_Phone_Number.txt) contained personally identifiable information (PII) such as email addresses and phone numbers that could be used in phishing attacks, spam, and identity theft, while exposed phone numbers could be used for harassment, targeted scams, or social engineering.
Some of the email addresses appear to belong to organizational domains such as educational institutions or corporations, revealing potential associations with businesses or institutions and increasing the risk of spear phishing attacks for those organizations.
The User_Email_Only.txt file contains numerous email addresses, which can be used as personal identifiers. Although there are no associated full names or physical addresses, these emails can still be linked to individuals, and some of the addresses were for organizational domains. The risk of phishing attacks is high, especially if these email addresses are cross-referenced with other leaks. Email hijacking could be a possibility if the leaked addresses are used across multiple platforms, Cyble said in its analysis.
Messages.txt, which contains user prompt data, contains critical security issues if valid. These include:
The exposure of tokens or API keys could provide an easy entry point for attackers to abuse the system and perform unauthorized actions, Cyble said, while exposing database credentials could allow attackers to potentially access and modify sensitive business data.
Cyble’s analysis said the file “involves a range of sensitive information, including credentials, tokens, and financial details. The severity of the leak is high, as it includes exploitable access points like database credentials, API keys, and payment information.”
Files.txt, which contains links to files uploaded by users, contains a range of sensitive file types, some of which include:
The exposure of these files could potentially lead to data breaches, unauthorized access to business systems, and exploitation of API vulnerabilities, while production-level details exposed increase the risk of system manipulation, unauthorized access, and exploitation of business-critical operations. If valid, these leaks pose a potentially serious risk to both user privacy and organizational security, Cyble said.
Adding information to an LLM makes it part of a larger data lake, and the right prompt may be all it takes to reveal the data to other parties in unintended ways. For that reason alone, sensitive data should not be entered into LLMs.
If an organization decides that there is value in extracting LLM insights from sensitive data, then the data should be masked, minimized, anonymized, and other sensitive data handling controls applied.
Organizations should address the use of LLMs and chatbots at the policy level, monitor for compliance, and limit inputs as much as possible while extracting needed insights.
Disclaimer: This blog is based on our research and the information available at the time of writing. It is for informational purposes only and does not constitute legal, financial, or professional advice. While we strive for accuracy, we do not guarantee the completeness or reliability of the content. If any sensitive information has been inadvertently included, please contact us for correction. Cyble is not responsible for any errors, omissions, or decisions made based on this content. Readers should verify findings and seek expert advice where necessary. All trademarks, logos, and third-party content belong to their respective owners and do not imply endorsement or affiliation. All content is presented “as is” without any guarantee that it is free of confidential, proprietary, or otherwise sensitive information. If you believe any portion of this content contains inadvertently shared or sensitive data, please contact us immediately so that we may address and rectify the issue. No Liability for Errors or Omissions Due to the dynamic nature of cyber threat activity, this [blog/report/article] may include partial, outdated, or otherwise incorrect information due to unverified sources, evolving security threats, or human error. We expressly disclaim any liability for errors or omissions or any potential consequences arising from the use, misuse, or reliance on this information.