OpenAI plans to have an autonomous AI system capable of handling small research problems by itself within six months. The "AI research intern," as the company calls it, would be the precursor to a far more ambitious system: a fully automated multi-agent researcher slated for 2028 that the company says will tackle problems too large or complex for humans alone.
As "Hvylya" reports, citing an exclusive MIT Technology Review interview with chief scientist Jakub Pachocki, the new goal will serve as OpenAI's "North Star" for the coming years, pulling together work on reasoning models, agents, and interpretability.
"What we're really looking at for an automated research intern is a system that you can delegate tasks [to] that would take a person a few days," Pachocki said. The idea is not new - rival Anthropic has disclosed that its own AI writes up to 90% of its training code - but OpenAI's public timeline is among the most specific any major lab has committed to.
Doug Downey, a research scientist at the Allen Institute for AI, said enthusiasm for such systems is growing across the field. "There are a lot of people excited about building systems that can do more long-running scientific research," he said. "The fact that you can delegate quite substantial coding tasks to tools like Codex is incredibly useful and incredibly impressive."
Pachocki argued the capability is a natural extension of existing progress. He pointed to the leap from GPT-3 to GPT-4, noting the newer model could sustain coherent problem-solving far longer even without specialized training. Reasoning models - which train LLMs to backtrack and break problems into subtasks - brought another jump. OpenAI is now feeding systems complex puzzles from math and coding contests specifically to push autonomous work duration further.
The company faces fierce competition from rival labs including Anthropic and Google DeepMind. Downey cautioned that chaining multiple tasks remains error-prone: "If you have to chain tasks together, then the odds that you get several of them right in succession tend to go down." He added that he has not tested the latest versions of GPT-5 and that his earlier results "might already be stale."
Also read: LLMs as New "Intelligence Bureaus": A Century-Old Vision Silicon Valley Accidentally Fulfilled.
