Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos Paper • 2605.18984 • Published May 18 • 22
LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV Paper • 2605.26244 • Published May 25 • 38
S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence Paper • 2606.20515 • Published 13 days ago • 40
ViQ: Text-Aligned Visual Quantized Representations at Any Resolution Paper • 2606.27313 • Published 6 days ago • 38
From Pixels to Words -- Towards Native One-Vision Models at Scale Paper • 2605.28820 • Published May 27 • 75
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding Paper • 2604.05015 • Published Apr 6 • 236
FileGram: Grounding Agent Personalization in File-System Behavioral Traces Paper • 2604.04901 • Published Apr 6 • 40
PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning Paper • 2603.26653 • Published Mar 27 • 18
PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning Paper • 2603.26653 • Published Mar 27 • 18
Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2603.18118 • Published Mar 18 • 12
Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2603.18118 • Published Mar 18 • 12
VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining Paper • 2603.15030 • Published Mar 16 • 21
Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition Paper • 2602.08439 • Published Feb 9 • 28
Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition Paper • 2602.08439 • Published Feb 9 • 28
Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark Paper • 2510.13759 • Published Oct 15, 2025 • 11
RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark Paper • 2509.24897 • Published Sep 29, 2025 • 46
Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding Paper • 2507.15028 • Published Jul 20, 2025 • 21
Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras Paper • 2507.17664 • Published Jul 23, 2025 • 2