AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents Paper • 2602.06855 • Published 6 days ago • 65
Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning Paper • 2601.19280 • Published 16 days ago • 9
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation Paper • 2601.20614 • Published 15 days ago • 118
InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning Paper • 2601.14209 • Published 23 days ago • 6
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems Paper • 2601.11004 • Published 28 days ago • 30
Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge Paper • 2601.08808 • Published 30 days ago • 39
Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning Paper • 2601.07641 • Published Jan 12 • 46
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning Paper • 2601.09667 • Published 29 days ago • 89
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published about 1 month ago • 147
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking Paper • 2601.06487 • Published Jan 10 • 52
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning Paper • 2601.09088 • Published 30 days ago • 62
OpenTinker: Separating Concerns in Agentic Reinforcement Learning Paper • 2601.07376 • Published Jan 12 • 6
Dr. Zero: Self-Evolving Search Agents without Training Data Paper • 2601.07055 • Published Jan 11 • 20
GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts Paper • 2601.05110 • Published Jan 8 • 29
MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era Paper • 2601.07526 • Published Jan 12 • 23