Yuhao Dong PRO

THUdyh

33 151 34

AI & ML interests

None yet

Recent Activity

authored a paper 3 days ago

Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos

authored a paper 3 days ago

LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV

authored a paper 3 days ago

S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence

View all activity

Organizations

authored 4 papers 3 days ago

Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos

Paper • 2605.18984 • Published May 18 • 22

LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV

Paper • 2605.26244 • Published May 25 • 38

S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence

Paper • 2606.20515 • Published 13 days ago • 40

ViQ: Text-Aligned Visual Quantized Representations at Any Resolution

Paper • 2606.27313 • Published 6 days ago • 38

authored a paper about 1 month ago

From Pixels to Words -- Towards Native One-Vision Models at Scale

Paper • 2605.28820 • Published May 27 • 75

authored 3 papers 3 months ago

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Paper • 2604.05015 • Published Apr 6 • 236

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

Paper • 2604.04901 • Published Apr 6 • 40

PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

Paper • 2603.26653 • Published Mar 27 • 18

submitted a paper to Daily Papers 3 months ago

PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

Paper • 2603.26653 • Published Mar 27 • 18

authored a paper 3 months ago

Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2603.18118 • Published Mar 18 • 12

submitted a paper to Daily Papers 3 months ago

Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2603.18118 • Published Mar 18 • 12

authored a paper 3 months ago

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

Paper • 2603.15030 • Published Mar 16 • 21

authored a paper 5 months ago

Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

Paper • 2602.08439 • Published Feb 9 • 28

submitted a paper to Daily Papers 5 months ago

Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

Paper • 2602.08439 • Published Feb 9 • 28

authored a paper 5 months ago

Kimi K2.5: Visual Agentic Intelligence

Paper • 2602.02276 • Published Feb 2 • 275

authored 2 papers 8 months ago

Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark

Paper • 2510.13759 • Published Oct 15, 2025 • 11

3EED: Ground Everything Everywhere in 3D

Paper • 2511.01755 • Published Nov 3, 2025 • 11

authored a paper 9 months ago

RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark

Paper • 2509.24897 • Published Sep 29, 2025 • 46

authored 2 papers 11 months ago

Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding

Paper • 2507.15028 • Published Jul 20, 2025 • 21

Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras

Paper • 2507.17664 • Published Jul 23, 2025 • 2

Yuhao Dong PRO

AI & ML interests

Recent Activity

Organizations

THUdyh's activity