GenAI is revolutionizing every industry to the point that a new GenAI model is released daily supporting some new niche specialty or another. While the power and potential of GenAI is evident for IT and security, the use cases in the security field are surprisingly immature largely due to censorship and guardrails that hamper many models’ utility for cybersecurity use cases.
“Uncensored” means that a model is made without predefined restrictions in either its training data or response capabilities.
You might think that censorship begins with the training data — that all references to sensitive data such as bomb-making, suicide, or vulnerabilities in software are eliminated from the training data at the beginning. At the fundamental level, censored and uncensored models begin the same. Initial training data is curated, the type of model and training technique is selected and the model is trained on the curated data, then tested and validated. We see articles about models serving harmful content because the model’s security has been compromised and the totality of the training data has been accessed. While it’s possible to omit certain data during training, omissions are typically motivated by performance rather than safety concerns. There is a risk to using incomplete datasets that hamper the model’s performance, and with training costs ever increasing, this is a risk that organizations can’t afford to take. The terrifying truth is for a model to be useful, it needs a wide swath of data, including information that could cause harm.
Model censorship typically occurs at the alignment and fine-tuning stage. Models that are generally trained LLMs (think all of the foundation models most users chat with) work well for conversation and general knowledge, but are not effective for specialized tasks like coding and cybersecurity. It takes training a model with specialized datasets that tune the behavior of the model to better suit the intended task. During this training, model creators have an opportunity to provide the model with specific instructions on how to respond to certain tasks and prompts. These instructions serve as guardrails that stop models from answering questions like, “How do I become Walter White?”
Model alignment is what determines how much censorship is applied to specific topics. The difficulty in aligning a model is figuring out the right safeguards to provide helpful answers to security professionals while omitting responses deemed irresponsible. Too much in one direction and you’ve created an unhelpful red-teaming model; far in the other direction creates Skynet.
As an artist and musician, censorship triggers thoughts of being silenced, but that’s not the kind of censorship happening here. AI censorship isn’t about stifling a voice; it’s stopping models from giving out information on how to build a bomb or cause harm in other ways.
One of my favorite movie genres is heist films. Let’s say I want to use an AI model to replicate a heist plot from a movie IRL. When I ask the model to help me bypass security at a target venue, it will refuse to answer and might even identify the information I am asking for as harmful. This is good and expected. Models shouldn’t hand out details about bypassing security as a general rule, but what if I’m researching or testing security?
Security professionals understand that testing is the only way to know how sound your security posture is. GenAI models that prevent security practitioners from looking for flaws in the system are not useful. More importantly, these models exist on the dark web, giving adversaries access to uncensored models to use against our organizations. Cybersecurity professionals and the organizations they support need models that are considered harmful by everyone else. There is real value and practical utility to using uncensored models.
Uncensored models represent more than just AI systems with fewer restrictions — they embody a fundamental shift in how to approach security research and threat analysis. So what is Censorship in GenAI, how should we use it and what are the risks?
If creating uncensored models is so difficult and potentially dangerous, why bother at all? Security teams require uncensored models to speed up several traditional operations including:
As attack vectors become increasingly sophisticated, security teams require tools to provide detailed, technical insights. Uncensored models fill this crucial gap, offering capabilities that extend beyond traditional AI implementations.
The integration of uncensored models sounds scary, but it represents a significant advancement in security capabilities. With an uncensored security-trained model, even a novice developer can harden their code (prompt: Find vulns in this code and suggest remediations). Add agentic behavior to that model and we find you can save time by automating event runbooks and debugging IAM parameters that equate to an extra headcount on a 3-person DevSecOps team. AI is the future, and, finally, it is ready for Infosec and DevSecOps teams.