OmniGPT Leak Claims Show Risk of Using Sensitive Data on AI Chatbots
威胁者声称获取OmniGPT后端数据库,涉及用户消息、文件链接及3万封邮件等敏感信息。Cyble分析显示数据包含PII、API密钥、财务详情等高风险内容,可能引发钓鱼攻击或系统漏洞利用。文章提醒用户避免在AI平台输入敏感数据,并建议组织采取匿名化和安全控制措施以降低风险。 2025-2-21 14:0:42 Author: cyble.com(查看原文) 阅读量:9 收藏

Recent claims by threat actors that they obtained an OmniGPT backend database show the risks of using sensitive data on AI chatbot platforms, where any data inputs could potentially be revealed to other users or exposed in a breach. 

OmniGPT has yet to respond to the claims, which were made by threat actors on the BreachForums leak site, but Cyble dark web researchers analyzed the exposed data. 

Cyble researchers detected potentially sensitive and critical data in the files, ranging from personally identifiable information (PII) to financial information, access credentials, tokens and API keys. The researchers did not attempt to validate the credentials but based their analysis on the potential severity of the leak if the TAs’ claims are confirmed to be valid.  

OmniGPT Hacker Claims 

OmniGPT integrates several well-known large language models (LLMs) into a single platform, including Google Gemini, ChatGPT, Claude Sonnet, Perplexity, DeepSeek and DALL-E, making it a convenient platform for accessing a range of LLM tools. 

OmniGPT

The threat actors (TAs), who posted under aliases that included Gloomer and SyntheticEmotions, claimed that the data “contains all messages between the users and the chatbot of this site as well as all links to the files uploaded by users and also 30k user emails. You can find a lot of useful information in the messages such as API keys and credentials and many of the files uploaded to this site are very interesting because sometimes, they contain credentials/billing information. 

OmniGPT

The data analyzed by Cyble includes four files: 

  • Files.txt, which contains links to files uploaded by users, some of which contain highly sensitive data 
  • Messages.txt, which contains user prompt data 
  • User_Email_Only.txt, a list of user email addresses found within the data 
  • UserID_Phone_Number.txt, which contains a combination of user email IDs and phone numbers. 

Analysis of Claimed OmniGPT Data Leak 

Cyble found that the email and phone number file (UserID_Phone_Number.txt) contained personally identifiable information (PII) such as email addresses and phone numbers that could be used in phishing attacks, spam, and identity theft, while exposed phone numbers could be used for harassment, targeted scams, or social engineering. 

Some of the email addresses appear to belong to organizational domains such as educational institutions or corporations, revealing potential associations with businesses or institutions and increasing the risk of spear phishing attacks for those organizations. 

The User_Email_Only.txt file contains numerous email addresses, which can be used as personal identifiers. Although there are no associated full names or physical addresses, these emails can still be linked to individuals, and some of the addresses were for organizational domains. The risk of phishing attacks is high, especially if these email addresses are cross-referenced with other leaks. Email hijacking could be a possibility if the leaked addresses are used across multiple platforms, Cyble said in its analysis.  

Messages.txt, which contains user prompt data, contains critical security issues if valid. These include: 

  • Bearer tokens and OpenAI API keys exposed in messages, potentially allowing unauthorized access to APIs. 
  • Database credentials, including username, password, and database name, which could provide attackers with full access to internal databases. 
  • Credential leaks, which includes email addresses and passwords that could lead to account compromises or unauthorized access. 
  • Table structure, which includes sensitive database schema information, potentially revealing critical business logic or proprietary data. 
  • Code base and endpoint leaks, which could enable attackers to analyze system vulnerabilities. 
  • Payment card details, such as credit card numbers, CVVs, and expiry dates, that could put users at risk for financial theft and fraud. 

The exposure of tokens or API keys could provide an easy entry point for attackers to abuse the system and perform unauthorized actions, Cyble said, while exposing database credentials could allow attackers to potentially access and modify sensitive business data. 

Cyble’s analysis said the file “involves a range of sensitive information, including credentials, tokens, and financial details. The severity of the leak is high, as it includes exploitable access points like database credentials, API keys, and payment information.” 

Files.txt, which contains links to files uploaded by users, contains a range of sensitive file types, some of which include: 

  • Document files that include sensitive information such as resumes and organizational documents with sensitive data 
  • Access credentials to databases 
  • API documentation detailing production-level API tokens, payload data, and expected outputs 
  • File types, such as .pdf, .doc, and .csv files, containing sensitive business information 
  • Project integration files with production-level data, including API keys, tokens, and user input payloads 

The exposure of these files could potentially lead to data breaches, unauthorized access to business systems, and exploitation of API vulnerabilities, while production-level details exposed increase the risk of system manipulation, unauthorized access, and exploitation of business-critical operations. If valid, these leaks pose a potentially serious risk to both user privacy and organizational security, Cyble said. 

Safe LLM Use 

Adding information to an LLM makes it part of a larger data lake, and the right prompt may be all it takes to reveal the data to other parties in unintended ways. For that reason alone, sensitive data should not be entered into LLMs. 

If an organization decides that there is value in extracting LLM insights from sensitive data, then the data should be masked, minimized, anonymized, and other sensitive data handling controls applied. 

Organizations should address the use of LLMs and chatbots at the policy level, monitor for compliance, and limit inputs as much as possible while extracting needed insights. 

Related

Disclaimer: This blog is based on our research and the information available at the time of writing. It is for informational purposes only and does not constitute legal, financial, or professional advice. While we strive for accuracy, we do not guarantee the completeness or reliability of the content. If any sensitive information has been inadvertently included, please contact us for correction. Cyble is not responsible for any errors, omissions, or decisions made based on this content. Readers should verify findings and seek expert advice where necessary. All trademarks, logos, and third-party content belong to their respective owners and do not imply endorsement or affiliation. All content is presented “as is” without any guarantee that it is free of confidential, proprietary, or otherwise sensitive information. If you believe any portion of this content contains inadvertently shared or sensitive data, please contact us immediately so that we may address and rectify the issue. No Liability for Errors or Omissions Due to the dynamic nature of cyber threat activity, this [blog/report/article] may include partial, outdated, or otherwise incorrect information due to unverified sources, evolving security threats, or human error. We expressly disclaim any liability for errors or omissions or any potential consequences arising from the use, misuse, or reliance on this information.


文章来源: https://cyble.com/blog/omnigpt-leak-risk-ai-data/
如有侵权请联系:admin#unsafe.sh