The standard institutional response to AI code risk - human review - is failing at every level it has been measured, according to a War on the Rocks analysis by Markus Sandelin, AI Lead at the NATO Communications and Information Agency. A Stanford study found that developers using AI assistants wrote measurably less secure code while reporting higher confidence in its security, "Hvylya" reports.
The numbers paint a stark picture. Developers with the least secure code rated their trust in AI at 4.0 out of 5.0. Those with the most secure code rated it at 1.5. "The system selects for the worst combination of overconfidence and under-competence," Sandelin wrote.
A systematic review of 74 studies across aviation, healthcare, military operations, and nuclear safety documented the mechanism: when an automated system provides an output, human oversight degrades. A developer's cognitive evaluation shifts from "Is this correct?" to "Does this look wrong?" - and the second question misses far more. Developers routinely retain the vast majority of AI-generated suggestions, and enterprise analyses show that AI-assisted teams ship dramatically more security findings per month.
Even without automation bias, human review performs poorly. A controlled study recruited 30 professional developers to review a web application containing seven known vulnerabilities. Not one found all seven, and one in five found none. The average detection rate was 33 percent. But simply instructing reviewers to focus specifically on security improved detection by a factor of eight - a cost-free intervention that most organizations do not implement.
Sandelin pointed to the 2024 XZ Utils backdoor as an illustration of current margins. A Microsoft engineer named Andres Freund caught a backdoor planted over two years in a compression library embedded in virtually every Linux system - not through a security audit, but because SSH connections were taking half a second longer and the latency annoyed him while benchmarking a database. That attack took one person 30 months to compromise one library. A training data poisoning attack needs 160 files to compromise a model that generates code for millions.
Also read: how the Pentagon's most powerful AI warfare champion went from skeptic to true believer.
