DeepDistill: Enhancing LLM Reasoning Capabilities via Large-Scale Difficulty-Graded Data Training Paper • 2504.17565 • Published Apr 24, 2025 • 2
Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data Paper • 2507.07095 • Published Jul 9, 2025 • 56
PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits Paper • 2509.11362 • Published Sep 14, 2025 • 5
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Paper • 2509.15221 • Published Sep 18, 2025 • 111
MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook Paper • 2509.14142 • Published Sep 17, 2025 • 10
MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning Paper • 2507.21924 • Published Jul 29, 2025 • 1
Hierarchical Dataset Selection for High-Quality Data Sharing Paper • 2512.10952 • Published Dec 11, 2025 • 2
FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity Recognition Paper • 2512.13884 • Published Dec 15, 2025 • 15
RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models Paper • 2601.03699 • Published Jan 7 • 8
Extreme Multi-Label Skill Extraction Training using Large Language Models Paper • 2307.10778 • Published Jul 20, 2023
DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning Paper • 2602.16742 • Published 9 days ago • 7