Have you ever wondered whether mathletes can go pro? Since 1959, the answer has been “yes” – with the height of achievement and prestige being at the International Mathematical Olympiad (IMO). Every year, more than 100 countries from around the world send in their best and brightest pre-university-level mathematicians to battle it out, mente-a-mente, against six of the hardest problems cooked up by other competing bodies.
The rest of this article is behind a paywall. Please sign in or subscribe to access the full content.And then, once all the challenges have been solved and medals awarded, they’re forgotten. “Every country brings a booklet of its most novel and most creative problems,” explained Shaden Alshammari, a PhD student at MIT, in a statement this week. “They share the booklets with each other, but no one had made the effort to collect them, clean them, and upload them online.”
Given how long the IMO has been going, any attempt to do that now would be a mammoth undertaking. So it’s with no small amount of awe that we’re here to tell you: Alshammari has done it.
Compiling MathNet
As much as the billionaires whose fortunes rely on it love to glaze generative AI, it’s still fairly bad for truly worthwhile use cases. Like, sure it can create obviously fake images of the Moon-Earth (Mearth?) and hallucinate a bunch of non-existent case law; sure it can celebrate the Nazis or turn you into a narcissist and would-be regicide, but can it play Atari? Can it tackle math problems it’s never seen before? Absolutely not, at least so far.
But, to be fair, neither can a baby. Any future genius needs practice before they can reach their full potential – and large-language and multimodal models are no exception. There’s just one problem: we’re kind of running out of things to test it with. “Existing benchmarks are limited in size, language coverage, and task diversity,” points out the abstract of a new paper from researchers at MIT, Saudi Arabia’s King Abdullah University of Science and Technology (KAUST), and Saudi artificial intelligence company HUMAIN.
The solution? MathNet – “a high-quality, large-scale, multimodal, and multilingual dataset of Olympiad-level math problems,” lead author Alshammari and her colleagues write, “together with a benchmark for evaluating mathematical reasoning in generative models and mathematical retrieval in embedding-based systems.”
Essentially, it’s a compilation of every IMO problem and solution the research team could find – some digital documents from the modern age; others scans from decades-old physical documents that had somehow survived the decades since they were originally deployed. There were thousands in total: more than 25,000 pages comprising 1,595 PDFs with more than 30,000 expertly authored math problems from 47 countries in 143 competitions.
It is, in total, the biggest dataset of its type around – and it’s not even close: the second-largest compilation of math problems and solutions is around one-fifth the size of MathNet. But size isn’t everything – and with data from so many diverse international teams, there’s something else MathNet has going for it that could make it a game-changer.
Broadening the data
For all that it may be a universal, math is just as culturally biased as anything else humans do. French kids might be taught from a pure math perspective, with focus on the Big Names from history; China is well-known for its numerical and algorithmic approaches. Russian students are taught rigorously, but with perhaps an extreme leaning towards the abstract; Americans approaches are kind of the opposite.
That means two things: first, that anybody – human or machine – who is limited to, say, English-only challenges for practice is really missing out, and second, that some problems are probably duplicates.
Let us explain. Here are two questions:
- A man has 100 meters of fencing and wants to create an enclosure in his yard. What’s the biggest area he can enclose?
- A sequence runs: 49, 96, 141, 184, 225, … . What is the sequence’s maximum value?
Now, perhaps these look very different to you – one is very real-world based, and seems to be quite geometrical; the other appears rather abstract and numerical. But in fact, these are the same question, with the same answer (leave it in the comments if you like!).
That’s a fairly simple example, but being able to translate problems from one mathematical framework to another is an invaluable skill for any student – but especially if you’re hoping to beat other mathematical competitors from around the world.
“I remember so many students for whom it was an individual effort. No one in their country was training them for this kind of competition,” said Alshammari, who competed in the IMO as a student herself. “We hope this gives them a centralized place with high-quality problems and solutions to learn from.”
“Other archives of Olympiad problems do exist,” added Tanish Patil, deputy leader of Switzerland's IMO, who did not work on the paper – noting the community forums of Art of Problem Solving as one well-known example – but “these resources lack a standardized formatting system, verified solutions, and important problem metadata that topics and theory require.”
For students and machines
As impressive as AI models can appear today, rigorous benchmarks show that we still have them beat when it comes to math. That’s something MathNet makes clear, in fact: when testing programs against some of the problems in their database, the team found that even the best-performing ones out there failed almost one in three.
And that’s a best-case scenario – if you pit an AI program against a problem that relies on figures, its performance gets much worse. As Stanislas Dehaene, a cognitive neuroscientist at the Collège de France who researches foundational geometric knowledge, told the New York Times two years ago, such systems “[do] not ‘see’ anything about the problems” and have “absolutely no spatial perception of the circles, lines and triangles that [they learn] to manipulate.”
MathNet, then, doesn’t just promise to be the biggest and most comprehensive resource out there for prospective international mathletes – it’s also an invaluable reality check for over-hyped AI models. And, at the same time, it may just be their solution, too. “It will […] be interesting to see how this dataset is used to improve the performance of reasoning models,” mused Patil, “and if we will soon be able to reliably answer an important issue when creating novel Olympiad questions: determining if a problem is truly original.”
Either way, “the MathNet database has the potential to be an excellent resource for both students and leaders seeking new problems to work on or looking for the solution to a difficult question,” he said. So, for now, why not head on over and take a crack?





