10 March 2026 Safety guardrails silently disabled

Config poisoning

2025–2026. Safety guardrails silently disabled.

A new class of attack is emerging against AI coding agents: prompt injection that targets the agent’s own configuration. Instead of stealing data or running destructive commands, the attacker tricks the agent into rewriting its safety settings. Once the config is modified, every future action runs without guardrails.

What happened

Multiple incidents have demonstrated this pattern:

CurXecute (CVE-2025-54135): A prompt injection in a repository’s README caused Cursor to rewrite its MCP configuration file. The modified config pointed to an attacker-controlled server, enabling remote code execution on every subsequent MCP tool call.
Claude Code settings hijack (2026): A malicious repository included instructions in its code comments that caused an AI agent to modify .claude/settings.json, disabling the safety hooks that would normally block dangerous operations.
Rules File Backdoor (2025): Researchers demonstrated that invisible Unicode characters hidden in .cursorrules and .github/copilot-instructions.md caused agents to silently generate backdoored code. The files looked normal to human reviewers.
IDE launch configuration poisoning (IDEsaster, 2025): Thirty vulnerabilities were found across major AI IDEs. Several involved modifying VS Code launch.json files to execute arbitrary code when debugging was started.

In each case, the agent modified a configuration file that controlled its own behavior or the behavior of surrounding tools. The changes were silent and persistent.

Why it works

AI agents treat configuration files as ordinary files. They have no concept of “this file controls my own safety.” When prompted, whether by malicious repository content, a compromised MCP server or injected instructions, agents will read and write config files just like any source code file.

The attack is especially dangerous because it is self-reinforcing. Once safety settings are disabled, the agent will no longer flag subsequent malicious actions.

Which rules block this

Four Vectimus rules prevent configuration poisoning:

vectimus-fileint-004: Blocks writes to governance config files (.claude/settings.json, hook configurations). Agents cannot modify their own safety settings.
vectimus-fileint-008: Blocks writes to MCP configuration files. Agents cannot redirect tool calls to attacker-controlled servers.
vectimus-fileint-007: Blocks writes to VS Code launch.json and extensions.json. Agents cannot plant execution payloads in IDE configurations.
vectimus-fileint-011: Blocks writes to agent instruction and rules files (.cursorrules, .github/copilot-instructions.md, .claude/instructions.md). Agents cannot modify the files that shape their own behavior.

The deny response tells the agent: “Configuration changes require human review. Suggest the change and let the developer apply it manually.”

What to learn from this

The most dangerous thing an agent can modify is the file that controls what it is allowed to do. Vectimus treats configuration files as a protected category. No agent action, regardless of the prompt, can modify governance settings, MCP configs, IDE launch configurations or agent instruction files. The safety layer protects itself. See the architecture overview for how file integrity policies work and the OWASP agentic mapping for how config poisoning maps to the OWASP top 10 for agentic AI.

Sources

When Public Prompts Turn Into Local Shells: CurXecute, RCE in Cursor via MCP Auto-Start, AIM Security, CVE-2025-54135 disclosure
CVE-2025-54135 and CVE-2025-54136, Vulnerabilities in Cursor, Tenable FAQ
IDEsaster: A Novel Vulnerability Class in AI IDEs, Ari Marzouk, 30+ vulnerabilities across AI IDEs including launch.json poisoning
New Vulnerability in GitHub Copilot and Cursor: How Hackers Can Weaponize Code Agents, Pillar Security, Rules File Backdoor research