Every attack on AI defense code has been demonstrated

Every building block needed to compromise AI-generated code at scale has been independently demonstrated in research labs and real-world incidents, according to a War on the Rocks analysis by Markus Sandelin, AI Lead at the NATO Communications and Information Agency. The concern is no longer reliability but deliberate compromise, "Hvylya" reports.

Researchers at USENIX Security 2024 showed that contaminating just 0.2 percent of a model's training data - 160 files out of 80,000 - embedded backdoors that evaded all standard detection tools. The poisoned output looks identical to clean code because the model learns the backdoor as a secondary pattern triggered by specific context, while the rest of the training data teaches it to write normal code.

Anthropic published research in January 2024 demonstrating that a model could write secure code under normal conditions while injecting exploitable vulnerabilities when triggered by a specific signal - in their experiment, the calendar year changing. "The backdoor survived every standard safety technique, including the reinforcement learning process specifically designed to remove unwanted behaviors," Sandelin wrote. Larger models proved harder to fix because they have more capacity to compartmentalize behaviors.

Trail of Bits demonstrated in August 2025 that an attacker could file a normal-looking bug report on GitHub containing invisible instructions in the page's HTML. When a coding assistant read the page, it followed the hidden instructions and installed a backdoor. The developer saw nothing unusual. In July 2025, an attacker exploited a flaw in the build process for Amazon Q Developer and injected a malicious instruction into the official product distributed through Visual Studio Code's marketplace. The compromised extension had over 964,000 installations and was publicly distributed for two days.

"The only reason it caused no damage was a syntax error in the attacker's payload," Sandelin noted. "A typo is the current margin of safety for AI coding tool supply chains."

Also read: how a four-star general's admission shattered the Pentagon's official line on its secret AI.

Researchers Demonstrate Every Technical Prerequisite for Compromising AI Defense Code

Russia Backs Iran Strikes on U.S. Forces With Secret Intelligence Support, Reports Claim

Trump Sends the Navy to Do What Sanctions No Longer Can

China, Pakistan Unveil 5-Point Peace Plan to End Iran War

Anthropic Sues Trump Administration Over Pentagon AI Deal as OpenAI Steps In

Why "Mowing the Lawn" in Iran Makes America Less Safe

From Farms to AI: Why History Suggests Human Workers Survive Every Revolution

Zelensky Rejects New Russian Ultimatum, Reveals US Talks Details

Foreign Affairs: America's Iran Gamble Mirrors Russia's Ukraine Trap

ATACMS Restrictions on Ukraine Send Wrong Signal as Russia Backs Iran, Analyst Argues

EU Delays €90 Billion Loan to Ukraine and 20th Russia Sanctions Package

Trump Blasts France and UK Over Refusal to Back Iran War

U.S. Navy Minesweeping Crisis Strands Most Gulf Ships in Singapore

Betts and Biddle Explain Why a Weaker Iran Grows More Dangerous for America

Leaked Tapes Reveal Hungary's Szijjarto Colluding With Lavrov to Derail EU Sanctions

Grok Nudges Right, GPT Nudges Left: FT Mapped Each AI Chatbot's Political Fingerprint

Iran Runs a Two-Pronged Attrition Strategy - and Washington Has No Answer

Trump's Tanker Seizures Rest on a Legal Theory That May Not Survive Court

Shouting Matches and Power Plays: WSJ Details the Feuds That Split OpenAI

At CPAC, Young Anti-War Conservatives Rally Behind the Room's Most Hawkish Figure