Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents Paper • 2604.06132 • Published 4 days ago • 107 • 5
Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning Paper • 2604.05404 • Published 4 days ago • 38 • 4
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding Paper • 2604.05015 • Published 5 days ago • 222 • 8
Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision Paper • 2604.04934 • Published 5 days ago • 35 • 5
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models Paper • 2604.04707 • Published 5 days ago • 196 • 12
ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation Paper • 2604.03922 • Published 6 days ago • 49 • 4
Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing Paper • 2604.02288 • Published 9 days ago • 27 • 3
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published 9 days ago • 164 • 6
ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement Paper • 2604.01591 • Published 9 days ago • 35 • 4
LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Models Paper • 2603.28301 • Published 12 days ago • 77 • 5
Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis Paper • 2603.06507 • Published Mar 6 • 1 • 1
Test-Time Scaling Makes Overtraining Compute-Optimal Paper • 2604.01411 • Published 10 days ago • 25 • 4
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models Paper • 2604.04707 • Published 5 days ago • 196 • 12
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents Paper • 2505.22954 • Published May 29, 2025 • 15 • 4
LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels Paper • 2603.19312 • Published 28 days ago • 24 • 2
Mamba-3: Improved Sequence Modeling using State Space Principles Paper • 2603.15569 • Published 25 days ago • 6 • 1