OpenAI confronts risks of autonomous AI doing something bad

The same man building OpenAI's most autonomous systems is openly worried about what they might do. Chief scientist Jakub Pachocki has laid out three scenarios that keep him and his colleagues up at night: the system goes rogue, it gets compromised by hackers, or it misinterprets what it was told to do.

"If you believe that AI is about to substantially accelerate research, including AI research, that's a big change in the world," Pachocki told MIT Technology Review in an exclusive interview, as "Hvylya" reports. "And it comes with some serious unanswered questions. If it's so smart and capable, if it can run an entire research program, what if it does something bad?"

OpenAI's primary safeguard is chain-of-thought monitoring - training reasoning models to keep a running log of their decision-making as they work. Researchers then use these "scratch pads" to catch unwanted behavior. The company published new details this week on how it uses the technique to monitor Codex, its agent-based coding tool. The plan is to eventually use separate LLMs to monitor the scratch pads of autonomous systems in real time.

Pachocki was candid about the limits of the approach. "I think it's going to be a long time before we can really be like, okay, this problem is solved," he said. He argued that very powerful models should be deployed in sandboxes, isolated from anything they could break or weaponize. The stakes are not theoretical: AI tools have already been used for novel cyberattacks, and some researchers worry about the risk of AI systems developing beyond their creators' ability to control them.

The safety conversation exists against a turbulent backdrop. The recent confrontation between Anthropic and the Pentagon over autonomous weapons exposed deep disagreements about where red lines should be drawn - and who gets to draw them. OpenAI stepped in to take the Pentagon deal Anthropic refused. Pachocki acknowledged that safety cannot be solved by any one company. "We'll definitely need a lot of involvement from policymakers," he said.

"I definitely think there are worrying scenarios that we can imagine," Pachocki added. Some researchers worry about synthetic pathogens; others about AI-designed cyberweapons. The window between deploying such systems and understanding them fully, Pachocki said, may be uncomfortably narrow.

Also read: Anthropic Offered to Help Improve Killer Drones: Where the Company Drew Its Red Line.

"It Could Do Something Bad": OpenAI Confronts Its Own Fears About Autonomous AI

Ukraine's Invisible Barrier: Why Wounded Veterans Face Isolation in Their Own Cities

A British Official's Stark Warning: Ukraine's War Is Coming for "Sleepy, Barely Defended Britain"

"Hereditary Great Power": Kofman Exposed the Delusion and Strategic Culture Behind Russia's Invasion

The Putin Successor Trap: Why the West Must Resist Its Worst Instinct

Dutch Court Delivers a Blow to Prosecutors in One of the Country's Biggest Security Scandals

Trump Issues Trillion-Dollar Ultimatum to Gulf States Over Iran Conflict

Iran War Risks Draining the Very Power Washington Needs to Counter Beijing

OpenAI Bets Its Future on Turning Coding Agents Into Universal Problem Solvers

Pentagon Pulls Marines and Fighter Jets From Asia to Iran, Tilting Indo-Pacific Balance

US Burns Through Expensive Missile Stocks in Iran as China Watches and Calculates

Poland Spends 4.48% of GDP on Defense - PISM Warns Even 5% Won't Close the Baltic Gap

Strategic Disarmament Without Occupation: The WWII Playbook Behind the Iran Campaign

Baltic Region Broke Free From Russian Energy in Two Years: Why It Remained Vulnerable

"Death Spasm, Not Expansion": What Iran's Proxy Attacks Actually Reveal About Tehran's Control

Harvard's Longest-Serving Professor Revealed Why Trump Is "More Democratic Than the Rest of Us"

Trump's Europe Pullback Meets Russia's 3-5 Year War Timeline: PISM Identifies a Dangerous Gap

From 350 Missiles to 25: The Numbers Behind Iran's Collapsing Strike Power

South Korea Confirms US May Relocate Air Defense Assets From Asia to Middle East

"Campaign Ahead of Diplomacy": What Critics of the Iran War Are Completely Missing