-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 78
Collections
Discover the best community collections!
Collections including paper arxiv:2510.15301
-
End-to-End Vision Tokenizer Tuning
Paper • 2505.10562 • Published • 22 -
Global and Local Entailment Learning for Natural World Imagery
Paper • 2506.21476 • Published • 1 -
DINOv3
Paper • 2508.10104 • Published • 291 -
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic
Paper • 2509.01363 • Published • 58
-
Efficient Generative Modeling with Residual Vector Quantization-Based Tokens
Paper • 2412.10208 • Published • 19 -
Normalizing Flows are Capable Generative Models
Paper • 2412.06329 • Published • 11 -
A Noise is Worth Diffusion Guidance
Paper • 2412.03895 • Published • 29 -
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
Paper • 2501.01423 • Published • 44
-
Depth Anything V2
Paper • 2406.09414 • Published • 103 -
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 51 -
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Paper • 2406.04338 • Published • 39 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 120
-
Is Noise Conditioning Necessary for Denoising Generative Models?
Paper • 2502.13129 • Published • 1 -
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers
Paper • 2504.10483 • Published • 21 -
Mean Flows for One-step Generative Modeling
Paper • 2505.13447 • Published • 7 -
Latent Diffusion Model without Variational Autoencoder
Paper • 2510.15301 • Published • 49
-
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Paper • 2506.07977 • Published • 41 -
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Paper • 2506.07986 • Published • 19 -
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Paper • 2506.06276 • Published • 26 -
Aligning Latent Spaces with Flow Priors
Paper • 2506.05240 • Published • 27
-
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Paper • 2503.09573 • Published • 74 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 54 -
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding
Paper • 2505.16990 • Published • 22 -
D-AR: Diffusion via Autoregressive Models
Paper • 2505.23660 • Published • 34
-
FAN: Fourier Analysis Networks
Paper • 2410.02675 • Published • 29 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 90 -
Scalable-Softmax Is Superior for Attention
Paper • 2501.19399 • Published • 24 -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 8
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 78
-
Is Noise Conditioning Necessary for Denoising Generative Models?
Paper • 2502.13129 • Published • 1 -
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers
Paper • 2504.10483 • Published • 21 -
Mean Flows for One-step Generative Modeling
Paper • 2505.13447 • Published • 7 -
Latent Diffusion Model without Variational Autoencoder
Paper • 2510.15301 • Published • 49
-
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Paper • 2506.07977 • Published • 41 -
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Paper • 2506.07986 • Published • 19 -
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Paper • 2506.06276 • Published • 26 -
Aligning Latent Spaces with Flow Priors
Paper • 2506.05240 • Published • 27
-
End-to-End Vision Tokenizer Tuning
Paper • 2505.10562 • Published • 22 -
Global and Local Entailment Learning for Natural World Imagery
Paper • 2506.21476 • Published • 1 -
DINOv3
Paper • 2508.10104 • Published • 291 -
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic
Paper • 2509.01363 • Published • 58
-
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Paper • 2503.09573 • Published • 74 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 54 -
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding
Paper • 2505.16990 • Published • 22 -
D-AR: Diffusion via Autoregressive Models
Paper • 2505.23660 • Published • 34
-
Efficient Generative Modeling with Residual Vector Quantization-Based Tokens
Paper • 2412.10208 • Published • 19 -
Normalizing Flows are Capable Generative Models
Paper • 2412.06329 • Published • 11 -
A Noise is Worth Diffusion Guidance
Paper • 2412.03895 • Published • 29 -
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
Paper • 2501.01423 • Published • 44
-
FAN: Fourier Analysis Networks
Paper • 2410.02675 • Published • 29 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 90 -
Scalable-Softmax Is Superior for Attention
Paper • 2501.19399 • Published • 24 -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 8
-
Depth Anything V2
Paper • 2406.09414 • Published • 103 -
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 51 -
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Paper • 2406.04338 • Published • 39 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 120