The UK's National Cyber Security Centre (NCSC) has issued a warning about the risks of integrating large language models (LLMs) like OpenAI’s ChatGPT into other services. One of the major risks is the possibility of prompt injection attacks.
The NCSC points out several dangers associated with integrating a technology that is very much in early stages of development into other services and platforms. Not only could we be investing in a LLM that no longer exists in a few years (anyone remember Betamax?), we could also get more than we bargained for and need to change anyway.
Even if the technology behind LLMs is sound, our understanding of the technology and what it is capable of is still in beta, says the NCSC. We barely have started to understand Machine Learning (ML) and Artificial Intelligence (AI) and we are already working with LLMs. Although fundamentally still ML, LLMs have been trained on increasingly vast amounts of data and are showing signs of more general AI capabilities.
We have already seen that LLMs are susceptible to jailbreaking and can fall for “leading the witness” types of questions. But what if a cybercriminal was able to change the input a user of a LLM based service?
Which brings us to prompt injection attacks. Prompt Injection is a vulnerability that is affecting some AI/ML models and, in particular, certain types of language models using prompt-based learning. The first prompt injection vulnerability was reported to OpenAI by Jon Cefalu on May 3, 2022.
Prompt Injection attacks are a result of prompt-based learning, a language model training method. Prompt-based learning is based on training a model for a task where customization for the specific task is performed via the prompt, by providing the examples of the new task we want to achieve.
Prompt Injection is not very different from other injection attacks we are already familiar with, e.g. SQL attacks. The problem is that an LLM inherently cannot distinguish between an instruction and the data provided to help complete the instruction.
An example provided by the NCSC is:
“Consider a bank that deploys an 'LLM assistant' for account holders to ask questions, or give instructions about their finances. An attacker might be able send you a transaction request, with the transaction reference hiding a prompt injection attack on the LLM. When the LLM analyses transactions, the attack could reprogram it into sending your money to the attacker’s account. Early developers of LLM-integrated products have already observed attempted prompt injection attacks.”
The comparison to SQL injection attacks is enough to make us nervous. The first documented SQL injection exploit was in 1998 by cybersecurity researcher Jeff Forristal and, 25 years later, we still see them today. This does not bode well for the future of keeping prompt injection attacks at bay.
Another potential danger the NCSC warned about is data poisoning. Recent research has shown that even with limited access to the training data, data poisoning attacks are feasible against “extremely large models”. Data poisoning occurs when an attacker manipulates the training data or fine-tuning procedures of an LLM to introduce vulnerabilities, backdoors, or biases that could compromise the model’s security, effectiveness, or ethical behavior.
Prompt injection and data poisoning attacks can be extremely difficult to detect and mitigate, so it’s important to design systems with security in mind. When you’re implementing the use of an LLM in your service, one thing you can do is apply a rules-based system on top of the ML model to prevent it from taking damaging actions, even when prompted to do so.
Equally important advice is to keep up with published vulnerabilities and make sure that you can update or patch the implemented functionality as soon as possible without disrupting your own service.
Malwarebytes EDR and MDR remove all remnants of ransomware and prevents you from getting reinfected. Want to learn more about how we can help protect your business? Get a free trial below.