PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks Paper • 2602.06663 • Published 19 days ago • 5
PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks Paper • 2602.06663 • Published 19 days ago • 5
PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks Paper • 2602.06663 • Published 19 days ago • 5
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Paper • 2601.22975 • Published 26 days ago • 103
AgentDevel: Reframing Self-Evolving LLM Agents as Release Engineering Paper • 2601.04620 • Published Jan 8 • 3
Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning Paper • 2601.03872 • Published Jan 7 • 43
Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics Paper • 2512.12602 • Published Dec 14, 2025 • 44
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks Paper • 2511.15065 • Published Nov 19, 2025 • 77
P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published Nov 17, 2025 • 134
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark Paper • 2510.26802 • Published Oct 30, 2025 • 34
Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM Paper • 2408.07246 • Published Aug 14, 2024 • 22 • 5
Biology Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models Paper • 2412.19191 • Published Dec 26, 2024