CVE-2026-5241: Transformers trust_remote_code Bypass

A CVSS 9.6 flaw in transformers 5.2.0 let an untrusted config.json override the very setting meant to block remote code execution, turning AutoModel.from_pretrained into a code-execution primitive.

The single most important safety switch in the Hugging Face ecosystem is trust_remote_code. When a developer loads a model with trust_remote_code=False, they are making an explicit statement: do not execute any Python this repository ships; just load weights. It is the boundary that lets an organization pull a model from a public hub without granting that model's author code execution on the loading machine. CVE-2026-5241, scored 9.6 by the National Vulnerability Database, is the discovery that in transformers 5.2.0, that switch could be flipped back on by the untrusted model itself.

The vulnerable path runs through LightGlue, a feature-matching model. When a victim calls AutoModel.from_pretrained() on such a model and explicitly passes trust_remote_code=False, the loader nonetheless consults the repository's config.json — an attacker-controlled file — and lets a value inside it propagate into a nested configuration call. The setting designed to be the developer's decision becomes the attacker's.

"The issue arises because the `trust_remote_code` parameter, intended to prevent remote code execution, is overridden by untrusted serialized configuration data in a nested code path. Specifically, when loading a LightGlue model using `AutoModel.from_pretrained()` with `trust_remote_code=False`, the `LightGlueConfig` reads the `trust_remote_code` value from the untrusted `config.json` file and propagates it into nested `AutoConfig.from_pretrained()` calls."— NVD, source

The defect is a textbook trust-boundary inversion, classified as CWE-829: inclusion of functionality from an untrusted control sphere. The caller's explicit, hard-coded False should be authoritative; instead a deserialized value from the model repository — data that should never have been allowed to vote on a security decision — wins in the nested call. The result, in the NVD's words, is "the execution of attacker-provided Python modules, even when the victim explicitly disables remote code execution."

Why this is worse than a normal supply-chain risk

Everyone in machine learning already knows, in the abstract, that loading an arbitrary model is risky. The whole reason trust_remote_code exists is to give practitioners a way to say no to that risk while still using a model's weights. What makes CVE-2026-5241 acute is that it defeats the mitigation people were told to rely on. A developer who did everything right — read the docs, set trust_remote_code=False, treated the model as untrusted — still got popped. The guardrail did not just fail to help; it created a false sense of safety.

The CVSS vector, AV:N/AC:L/PR:N/UI:R/S:C/C:H/I:H/A:H, captures the shape of the threat. It is network-reachable and low-complexity, requires no privileges, and carries a changed scope with high impact on confidentiality, integrity, and availability. The UI:R term reflects that a victim must be induced to load the malicious model — but in practice that is the normal workflow. People pull models from public hubs constantly, often by name, often on the recommendation of a paper, a tutorial, or a teammate. Typosquatting a popular repository or compromising a legitimate one is a well-trodden path, and the victim's "interaction" is simply doing their job.

Where the blast radius is largest

The advisory is specific about the environments most at risk, and it reads like a tour of the modern ML stack's soft underbelly. The vulnerability "poses a high risk for environments such as API inference servers, research notebooks, CI/CD pipelines, and model evaluation workers, potentially leading to credential theft, lateral movement, or persistence/backdoor deployment."

Each of those settings is dangerous for its own reason. Inference servers are long-lived, network-exposed, and frequently hold credentials to downstream services and data stores. CI/CD pipelines run with privileged tokens and are a classic pivot point toward an organization's source code and deployment infrastructure. Research notebooks tend to run on machines logged into cloud accounts and corporate SSO, with little isolation. Model-evaluation workers pull and execute models from many sources by design, which is the exact behavior this bug weaponizes. In all of them, code execution on load translates quickly into the credential theft, lateral movement, and backdoor deployment the advisory warns about — this is not a contained crash but an initial-access primitive.

Remediation

The fix is to upgrade transformers to the patched release; the NVD entry references the upstream commit that corrects the propagation so the caller's trust_remote_code value is no longer overridable by repository-supplied configuration data. A corresponding huntr report is also linked from the record. Teams should pin to the fixed version across every place transformers runs — not just production inference, but the notebooks, evaluation jobs, and pipeline steps that are easy to forget and were explicitly named as high-risk.

It is also instructive to see this as part of a broader reckoning with how ML frameworks handle untrusted artifacts. The field spent years treating model files as inert data — just weights — when in reality the loading machinery around them is a rich execution surface: custom code in repositories, deserialization of configs and checkpoints, and feature flags like trust_remote_code that try to bolt a security policy onto a system that was not architected with one. Each new CVE in this lineage tightens a specific path, but the structural tension remains. As long as loaders consult attacker-controlled files to make security-relevant decisions, the burden of proof should sit with the framework to demonstrate that caller intent cannot be overridden — and CVE-2026-5241 is evidence that this property has to be tested explicitly, including down every nested configuration path, rather than assumed.

Beyond patching, this CVE is a prompt to reconsider defense-in-depth around model loading generally. The lesson is that a single boolean flag, however well-intentioned, is a brittle place to anchor a security boundary — especially when the code path that honors it threads through deserialization of attacker-controlled data. Running model loads in sandboxed or least-privilege environments, restricting outbound network access from inference workers, and vetting model provenance are the kinds of controls that would have blunted CVE-2026-5241 even with the flag bypassed. The most uncomfortable takeaway is the most important one: in 5.2.0, trust_remote_code=False did not mean what it said, and any architecture that treated it as a hard guarantee inherited that gap.

trust_remote_code=False Wasn't: CVE-2026-5241 Lets a Malicious Model Run Code in Hugging Face Transformers

Why this is worse than a normal supply-chain risk

Where the blast radius is largest

Remediation

Comments