Finished Reading
updated
Self-Play Preference Optimization for Language Model Alignment
Paper
• 2405.00675
• Published
• 28
FlashAttention: Fast and Memory-Efficient Exact Attention with
IO-Awareness
Paper
• 2205.14135
• Published
• 15
Attention Is All You Need
Paper
• 1706.03762
• Published
• 115
FlashAttention-2: Faster Attention with Better Parallelism and Work
Partitioning
Paper
• 2307.08691
• Published
• 9
FlashAttention-3: Fast and Accurate Attention with Asynchrony and
Low-precision
Paper
• 2407.08608
• Published
• 1
Efficient Transformers: A Survey
Paper
• 2009.06732
• Published
• 1
Linformer: Self-Attention with Linear Complexity
Paper
• 2006.04768
• Published
• 2
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
• 2402.13753
• Published
• 116
YaRN: Efficient Context Window Extension of Large Language Models
Paper
• 2309.00071
• Published
• 81
RoFormer: Enhanced Transformer with Rotary Position Embedding
Paper
• 2104.09864
• Published
• 17
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper
• 2310.11453
• Published
• 106
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
• 2402.17764
• Published
• 627
LLaMA: Open and Efficient Foundation Language Models
Paper
• 2302.13971
• Published
• 21
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
• 2307.09288
• Published
• 250
Training Compute-Optimal Large Language Models
Paper
• 2203.15556
• Published
• 11
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head
Checkpoints
Paper
• 2305.13245
• Published
• 6
Accuracy is Not All You Need
Paper
• 2407.09141
• Published
• 3