DiPO: Disentangled Perplexity Policy Optimization for Fine-grained Exploration-Exploitation Trade-Off Paper • 2604.13902 • Published 18 days ago • 62
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published 11 days ago • 239
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time Paper • 2604.11626 • Published 20 days ago • 101
WildDet3D: Scaling Promptable 3D Detection in the Wild Paper • 2604.08626 • Published 24 days ago • 245
MolmoWeb: Open Visual Web Agent and Open Data for the Open Web Paper • 2604.08516 • Published 24 days ago • 42
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 500
DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing Paper • 2603.28713 • Published Mar 30 • 22
STRIDE: When to Speak Meets Sequence Denoising for Streaming Video Understanding Paper • 2603.27593 • Published Mar 29 • 12
MMFace-DiT: A Dual-Stream Diffusion Transformer for High-Fidelity Multimodal Face Generation Paper • 2603.29029 • Published Mar 30 • 13
j05hr3d/Llama-3.2-3B-Instruct-C_M_T_CT_CE_CM-2EP-SEED999 Text Generation • 3B • Updated Apr 1 • 65 • 1
SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models Paper • 2603.16859 • Published Mar 17 • 248