Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
Abstract
A systematic approach transforms post-trained reasoning models into rigorous olympiad-level solvers through reverse-perplexity curriculum, two-stage reinforcement learning, and test-time scaling, achieving gold-medal performance on mathematical and physics competitions.
Recent progress in reasoning models has substantially advanced long-horizon mathematical and scientific problem solving, with several systems now reaching gold-medal-level performance on International Mathematical Olympiad (IMO) and International Physics Olympiad (IPhO) problems. In this paper, we introduce a simple and unified recipe for converting a post-trained reasoning backbone into a rigorous olympiad-level solver. The recipe first uses a reverse-perplexity curriculum for SFT to instill rigorous proof-search and self-checking behaviors, then scales these behaviors through a two-stage RL pipeline that progresses from RL with verifiable rewards to more delicate proof-level RL, and finally boosts solving performance with test-time scaling. Applying this recipe, we train a 30B-A3B backbone with SFT on around 340K sub-8K-token trajectories followed by 200 RL steps. The resulting model, SU-01, supports stable reasoning on difficult problems with trajectories exceeding 100K tokens, while achieving gold-medal-level performance on mathematical and physical olympiad competitions, including IMO 2025/USAMO 2026 and IPhO 2024/2025. It also demonstrates strong generalization of scientific reasoning to domains beyond mathematics and physics.
Community
We have open-sourced our code and model. Please check out our project page and GitHub repository:
https://simplified-reasoning.github.io/SU-01/
https://github.com/Simplified-Reasoning/SU-01
This paper makes me feel fuzzy inside.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- QED-Nano: Teaching a Tiny Model to Prove Hard Theorems (2026)
- Fine-Tuning Small Reasoning Models for Quantum Field Theory (2026)
- Riemann-Bench: A Benchmark for Moonshot Mathematics (2026)
- OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories (2026)
- Hint Tuning: Less Data Makes Better Reasoners (2026)
- Solving Physics Olympiad via Reinforcement Learning on Physics Simulators (2026)
- Scaling Reasoning Tokens via RL and Parallel Thinking: Evidence From Competitive Programming (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.13301 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 2
axi0mX/SU-01-GGUF
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper