The headlines say AI is cracking decades-old math problems. The data says it fails on 98 out of every 100 attempts. Mathematician Terence Tao laid out the gap between perception and reality in a conversation with Dwarkesh Patel, describing a selection bias that inflates the apparent power of AI in mathematical research.
"If you only focus on the success stories, the ones that get broadcast on social media, it looks amazing," Tao said, as reported by "Hvylya", citing the Dwarkesh Podcast. "But whenever we do a systematic study, on any given problem an AI tool has a success rate of maybe 1% or 2%. It's just that they can buy scale, and you just pick the winners."
The mechanism is straightforward. AI companies and individual researchers run frontier models against large collections of open problems simultaneously. The rare wins circulate on social media; the failures go unreported. "People have done large-scale sweeps of these Erdős problems," Tao said. The pattern holds across different problem sets and model generations.
Tao predicted this dynamic will intensify as models get stronger. "Some AI may get lucky and actually solve them, and there will be some backdoor to solve the problem that everyone else missed. That will get a lot of publicity," he said. "But then people will try these fancy tools on their own favorite problem, and they will again experience the 1% to 2% success rate."
The solution, Tao argued, lies in standardized benchmarks. "There are efforts now to create a standard set of challenge problems for AIs to solve, and not just rely on the AI companies to only publish their wins and not disclose their negative results," he said. Without such datasets, the public picture of AI capability in mathematics will remain distorted by survivorship bias. "The progress is simultaneously amazing and disappointing," Tao said. "It is a very strange feeling to see these tools in action."
Earlier, "Hvylya" reported on a ChatGPT user who built a personalized cancer vaccine for his dog - and caught Sam Altman's attention.
