Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content Paper • 2410.08260 • Published Oct 10, 2024
SPF-Portrait: Towards Pure Portrait Customization with Semantic Pollution-Free Fine-tuning Paper • 2504.00396 • Published Apr 1, 2025 • 3
HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment Paper • 2503.23907 • Published Mar 31, 2025 • 3
Position: Interactive Generative Video as Next-Generation Game Engine Paper • 2503.17359 • Published Mar 21, 2025 • 61
FullDiT: Multi-Task Video Generative Foundation Model with Full Attention Paper • 2503.19907 • Published Mar 25, 2025 • 8
SketchVideo: Sketch-based Video Generation and Editing Paper • 2503.23284 • Published Mar 30, 2025 • 23
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published Mar 31, 2025 • 76
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers Paper • 2503.14487 • Published Mar 18, 2025 • 28
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs Paper • 2408.11813 • Published Aug 21, 2024 • 12
MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding Paper • 2410.21747 • Published Oct 29, 2024
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints Paper • 2412.07760 • Published Dec 10, 2024 • 55
3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation Paper • 2412.07759 • Published Dec 10, 2024 • 18
StyleMaster: Stylize Your Video with Artistic Generation and Translation Paper • 2412.07744 • Published Dec 10, 2024 • 20
VIVID-10M: A Dataset and Baseline for Versatile and Interactive Video Local Editing Paper • 2411.15260 • Published Nov 22, 2024
Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation Paper • 2411.14423 • Published Nov 21, 2024
Towards Precise Scaling Laws for Video Diffusion Transformers Paper • 2411.17470 • Published Nov 25, 2024 • 1
DVIS++: Improved Decoupled Framework for Universal Video Segmentation Paper • 2312.13305 • Published Dec 20, 2023
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning Paper • 2501.04698 • Published Jan 8, 2025 • 15
GameFactory: Creating New Games with Generative Interactive Videos Paper • 2501.08325 • Published Jan 14, 2025 • 67