OpenAI Erdős Proof vs DeepMind AlphaProof vs Microsoft Math (May 2026)
OpenAI Erdős Proof vs DeepMind AlphaProof vs Microsoft Math (May 2026)
On May 21, 2026, OpenAI announced that a general-purpose reasoning model autonomously cracked a famous Erdős discrete-geometry problem that had stumped mathematicians for 80 years. The announcement was treated as a landmark moment — original mathematical research from a general reasoning model. Here’s how it compares to DeepMind’s AlphaProof and Microsoft’s math AI work.
Last verified: May 24, 2026.
TL;DR table
| OpenAI Erdős Result | DeepMind AlphaProof | Microsoft Math AI | |
|---|---|---|---|
| Announced | May 21, 2026 | Original 2024, IMO silver July 2024 | Ongoing research, no marquee result |
| Approach | General reasoning model | Specialized formal-math + Lean + RL | Lean integration + RL + Copilot |
| Specialized for math? | No — general model | Yes — math-specific | Partial — combines general + specialized |
| Major result | Cracked 80-year Erdős discrete-geometry problem | IMO silver-medal performance 2024 | None marquee as of May 2026 |
| Output format | Natural-language proof (verified separately) | Formal Lean proofs | Mixed |
| Verifiability | Requires human / Lean verification | Machine-checkable (Lean) | Mixed |
| Public availability | Capability demonstrated in research blog | Not directly purchasable | Embedded in Copilot products |
| Significance | General models can do original math | Specialized AI can do hard math | Math AI as platform feature |
What each approach actually is
OpenAI Erdős result — general model, original discovery
The May 21, 2026 announcement was deliberately structured to make a specific point: a general-purpose reasoning model, not a math-specialized system, produced an original mathematical proof of a problem that had been open since the 1940s.
The model was reportedly given the problem statement, allowed to think for an extended period (using OpenAI’s chain-of-thought / “reasoning” framework), and produced a proof that was subsequently verified by professional mathematicians. The proof wasn’t just a restatement of known techniques — it included a novel approach that the human reviewers found genuinely surprising.
The significance for the field: frontier reasoning models have reached the threshold of original mathematical research without specialized math training. That’s a much stronger claim than “we built a specialized system that does math.”
DeepMind AlphaProof — specialized formal-math system
AlphaProof (and its sibling AlphaGeometry) are DeepMind’s specialized math AI systems. Architecture:
- Lean integration: outputs formal proofs in the Lean proof assistant, which can be machine-checked.
- Reinforcement learning trained specifically on mathematical problems.
- Solver loop: generates candidate proofs, checks them in Lean, learns from failures.
- IMO silver-medal performance (July 2024) — solved 4/6 International Math Olympiad problems at silver-medal level.
AlphaProof’s strength is rigorous, machine-verifiable proofs. Its weakness is narrowness — it’s a math system, not a general reasoning system.
Microsoft math AI — embedded platform capability
Microsoft Research has substantial math-AI work but tends to integrate it into Copilot products rather than announce standalone results. Known initiatives:
- Lean copilot — AI-assisted theorem proving inside Lean (collaboration with the Lean community).
- Math-aware RL training for general models.
- Collaboration with Terence Tao on AI-assisted math research (publicly discussed by Tao on his blog).
- Math reasoning improvements in Phi and other Microsoft models.
As of May 2026, no Microsoft system has produced a marquee result comparable to OpenAI’s Erdős proof or AlphaProof’s IMO performance — but the foundational work is real.
Why the OpenAI result matters more than the headline
Three reasons the May 21 result is bigger than “another AI does math”:
1. General > specialized. AlphaProof showed specialized AI could do hard math. OpenAI’s Erdős result shows general AI can do hard math. The capability lives in the broader model now, not just in a math-specialized system. That has implications for every other formal-reasoning domain.
2. Original discovery, not problem-solving. Solving IMO problems is hard but well-defined — there’s a known answer, you find it. Cracking an open Erdős problem means proving something nobody knew before. That’s a structurally different capability.
3. Reproducibility implications. If the model can do this once, what stops it from doing it 100 times across different open problems? OpenAI hasn’t said yet whether it intends to systematically apply the model to known open problems — but the math community is watching closely.
What each approach is best at
OpenAI’s general reasoning approach wins for:
- Original discovery in formal domains (math, theoretical physics, formal verification).
- Problems where the “right shape” of the proof isn’t known in advance.
- Cross-domain transfer (a general model can reason about physics and chemistry too).
DeepMind AlphaProof wins for:
- Machine-verifiable proofs (Lean output is checkable, not just narrative).
- Olympiad-style problems with clear formal statements.
- Domains where rigor matters more than novelty.
Microsoft’s embedded approach wins for:
- Math assistance in everyday workflows (Excel, Word, Copilot).
- Lean theorem-proving assistance for working mathematicians.
- Research collaboration tooling (Tao + Microsoft work pattern).
Verifiability and the “is this proof correct?” problem
A persistent question: when a model claims to have proved something, how do you know?
| Approach | Verification method |
|---|---|
| OpenAI Erdős result | Human mathematicians reviewed the proof |
| DeepMind AlphaProof | Lean proof assistant machine-checks |
| Microsoft Lean integration | Lean machine-checks |
OpenAI’s approach is the most fragile here — natural-language proofs require expert human verification, which doesn’t scale. The likely 2026-2027 trajectory: general reasoning models output natural-language proofs that get automatically translated into Lean for machine verification. That hybrid pattern combines the strengths of both approaches.
What this means for science more broadly
If a general reasoning model can produce novel math, the same capability likely transfers to:
- Theoretical physics — proving new properties of known models.
- Chemistry — predicting properties of novel compounds.
- Theoretical computer science — algorithmic complexity results.
- Formal verification — proving correctness of software/hardware.
- Economics theory — formal mechanism design.
Expect 2026-2027 to bring “AI discovers X” announcements across formal-reasoning sciences. The pattern from the Erdős result is the template: take an open problem with a clear statement, let the model reason for an extended period, verify the result with domain experts.
Skeptic’s checklist
Before getting too excited:
- Single result, not yet replicated. OpenAI announced one breakthrough on one problem. The math community needs to see this repeatedly across multiple problems.
- Compute requirements unstated. If the model required millions of dollars of inference for one proof, that limits practical use.
- Cherry-picking risk. OpenAI selected the problem to publicize. We don’t know how many problems the model tried and failed before succeeding on this one.
- Verification not automated. Until natural-language proofs can be auto-translated to Lean, this approach scales poorly compared to AlphaProof’s machine-checkable output.
These caveats don’t diminish the result, but they shape expectations. The most likely 2026 reality: OpenAI’s reasoning model can occasionally produce novel math when given enough compute and the right problem framing, while AlphaProof and Lean-based systems remain the production tools for math research workflows.
Verdict
- Best for original mathematical discovery (capability demonstration): OpenAI Erdős result — landmark milestone.
- Best for machine-verifiable, production-quality formal proofs: DeepMind AlphaProof + Lean.
- Best for embedded math help in everyday tools: Microsoft Math AI (Copilot, Lean assistance).
- The bigger picture: general reasoning models are now capable of original research in formal domains. Expect cascading announcements across sciences through 2026-2027.
The market story: the most important AI capability story of 2026 isn’t a new model release — it’s that frontier general models have crossed into original scientific discovery. This is the precursor to AI being treated as a research collaborator rather than just an assistant.