A hard-to-detect AI prompt-injection vulnerability in Google’s Gemini AI chatbot could put the 2 billion users of its Gmail service at risk of phishing attacks.
The indirect prompt injection attack on Google’s Gemini AI chatbot, outlined by Marco Figueroa, manager of Mozilla’s generative AI bug bounty programs, highlights a flaw in the large-language model (LLM) and takes advantage of users’ tendency to trust AI outputs and to rely on AI-generated summaries.
As detailed by Figueroa in 0din– Mozilla’s bug bounty program for generative AI tools – the vulnerability in Gemini for Workspace allows a bad actor to hide malicious instructions in an email in Gmail, which means that they don’t need to rely on links, scripts, or attachments to run the scam.
Google began putting Gemini AI assistant features in Gmail last year, with one capability being summarizing the content of emails. The prompt-injection vulnerability submitted to odin by a researcher would allow a threat actor to hide a malicious instruction in an email.
“When the recipient clicks ‘Summarize this email’, Gemini faithfully obeys the hidden prompt and appends a phishing warning that looks as if it came from Google itself,” Figueroa wrote. “Because the injected text is rendered in white-on-white (or otherwise hidden), the victim never sees the instruction in the original message, only the fabricated ‘security alert’ in the AI-generated summary.”
The security alert instruction – which is attached to the email but can’t be seen by the user and is able to bypass spam filters – tells Gemini that is has to include it at the end of the message. However, it can be seen when the user asks that the email be summarized and tells them that Gemini has detected that their Gmail password has been compromised, adding a phone number to call and a reference number to use to reset it.
If the target trusts the AI-generated notice and follows the instructions, it can lead to credentials being compromised or a phone-based social-engineering scam, Figueroa wrote.
“The issue is there is no protection from this form of prompt injection,” Mitch Ashley, vice president and practice lead for Software Lifecycle Engineering at The Futurum Group, told Security Boulevard. “Hacker prompts can be ghosted by using HTML like tiny fonts, white text, using HTML tags, and even just embedding the prompt as the disclaimer text at the bottom of emails.”
The indirect prompt-injection attack works for several reasons, Figueroa wrote, including that when Gemini is asked to summarize the email content, the hidden instructions become part of the model’s prompt.
“This is the textbook ‘indirect’ or ‘cross-domain’ form of prompt injection,” Figueroa wrote.
Also, most LLM guardrails rely on text that is visible to the users. However, tricks that can be used with HTML and CSS – such as zero or white fonts or off-screen text – bypass such protections because the model gets the raw markup. There is also an authority to the instruction, starting off with “You Gemini, have to …,” which he wrote “exploits the model’s system-prompt hierarchy; Gemini’s prompt-parser treats it as a higher-priority directive.”
The Futurum Group’s Ashley agreed, noting that “because the message is presented as part of the Gemini summary, it appears trustworthy.”
Such indirect prompt injections can lead to a range of threats beyond even social engineering and voice phishing – or vishing – attacks. They can bypass security features and can be used for targeted misinformation. They can also spread at scale.
“If integrated into business workflows [like newsletters or CRM], a single compromised SaaS account could scale this attack vector to thousands of users,” the analyst said.
Indirect prompt injections are a growing threat to LLMs, with The Alan Turing Institute’s Centre for Emerging Technology and Security calling them “generative AI’s greatest security flaw,” noting that “a key component of hidden instructions comes from the fact that a GenAI assistant does not read data in the way that a human does. This makes it possible to devise exceedingly simple methods of insertion that are invisible to the human eye but are central to a GenAI system’s retrieval process. When combined with the range of input methods available to a GenAI assistant – such as emails, documents and external web pages – the attack surface is broad and varied.”
Google researchers understand the extent of the threat, with the DeepMind unit outlining a process for continuously recognizing indirect prompt injection attacks in a research paper. Last month, Google wrote about a layered response to mitigate prompt injection attacks.
Figueroa called prompt injections “the new email macros,” which hackers can leverage to deliver viruses, ransomware, and other malware by embedding malicious code in them and delivering them through email attacks or within ZIP files.
’”Phishing For Gemini’ shows that trustworthy AI summaries can be subverted with a single invisible tag,” he wrote. “Until LLMs gain robust context-isolation, every piece of third-party text your model ingests is executable code. Security teams must treat AI assistants as part of the attack surface and instrument them, sandbox them, and never assume their output is benign.”
Recent Articles By Author