The same man building OpenAI's most autonomous systems is openly worried about what they might do. Chief scientist Jakub Pachocki has laid out three scenarios that keep him and his colleagues up at night: the system goes rogue, it gets compromised by hackers, or it misinterprets what it was told to do.
"If you believe that AI is about to substantially accelerate research, including AI research, that's a big change in the world," Pachocki told MIT Technology Review in an exclusive interview, as "Hvylya" reports. "And it comes with some serious unanswered questions. If it's so smart and capable, if it can run an entire research program, what if it does something bad?"
OpenAI's primary safeguard is chain-of-thought monitoring - training reasoning models to keep a running log of their decision-making as they work. Researchers then use these "scratch pads" to catch unwanted behavior. The company published new details this week on how it uses the technique to monitor Codex, its agent-based coding tool. The plan is to eventually use separate LLMs to monitor the scratch pads of autonomous systems in real time.
Pachocki was candid about the limits of the approach. "I think it's going to be a long time before we can really be like, okay, this problem is solved," he said. He argued that very powerful models should be deployed in sandboxes, isolated from anything they could break or weaponize. The stakes are not theoretical: AI tools have already been used for novel cyberattacks, and some researchers worry about the risk of AI systems developing beyond their creators' ability to control them.
The safety conversation exists against a turbulent backdrop. The recent confrontation between Anthropic and the Pentagon over autonomous weapons exposed deep disagreements about where red lines should be drawn - and who gets to draw them. OpenAI stepped in to take the Pentagon deal Anthropic refused. Pachocki acknowledged that safety cannot be solved by any one company. "We'll definitely need a lot of involvement from policymakers," he said.
"I definitely think there are worrying scenarios that we can imagine," Pachocki added. Some researchers worry about synthetic pathogens; others about AI-designed cyberweapons. The window between deploying such systems and understanding them fully, Pachocki said, may be uncomfortably narrow.
Also read: Anthropic Offered to Help Improve Killer Drones: Where the Company Drew Its Red Line.
