papers
updated
UFOGen: You Forward Once Large Scale Text-to-Image Generation via
Diffusion GANs
Paper
• 2311.09257
• Published
• 47
Latent Consistency Models: Synthesizing High-Resolution Images with
Few-Step Inference
Paper
• 2310.04378
• Published
• 22
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper
• 2309.14717
• Published
• 46
Exponentially Faster Language Modelling
Paper
• 2311.10770
• Published
• 119
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
• 2312.00752
• Published
• 150
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Paper
• 2402.08609
• Published
• 36
Linear Transformers with Learnable Kernel Functions are Better
In-Context Models
Paper
• 2402.10644
• Published
• 81
MobileLLM: Optimizing Sub-billion Parameter Language Models for
On-Device Use Cases
Paper
• 2402.14905
• Published
• 134
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
• 2403.03507
• Published
• 189
ShortGPT: Layers in Large Language Models are More Redundant Than You
Expect
Paper
• 2403.03853
• Published
• 66
GiT: Towards Generalist Vision Transformer through Universal Language
Interface
Paper
• 2403.09394
• Published
• 26
Rho-1: Not All Tokens Are What You Need
Paper
• 2404.07965
• Published
• 94
Self-MoE: Towards Compositional Large Language Models with
Self-Specialized Experts
Paper
• 2406.12034
• Published
• 16
RegMix: Data Mixture as Regression for Language Model Pre-training
Paper
• 2407.01492
• Published
• 40
Layerwise Recurrent Router for Mixture-of-Experts
Paper
• 2408.06793
• Published
• 32
MaskBit: Embedding-free Image Generation via Bit Tokens
Paper
• 2409.16211
• Published
• 17
Randomized Autoregressive Visual Generation
Paper
• 2411.00776
• Published
• 18
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on
a Single GPU
Paper
• 2502.08910
• Published
• 148
Slamming: Training a Speech Language Model on One GPU in a Day
Paper
• 2502.15814
• Published
• 69