Block Diffusion for Flash Speculative Decoding
Z Lab
university
AI & ML interests
Efficient AI
Recent Activity
Papers
DFlash: Block Diffusion for Flash Speculative Decoding
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
models
23
z-lab/gpt-oss-20b-DFlash
Text Generation
•
0.8B
•
Updated
z-lab/Qwen3-Coder-30B-A3B-DFlash
Text Generation
•
Updated
•
1.09k
•
27
z-lab/LLaMA3.1-8B-Instruct-DFlash-UltraChat
Text Generation
•
1B
•
Updated
•
560
•
2
z-lab/Qwen3-4B-DFlash-b16
Text Generation
•
0.5B
•
Updated
•
6.61k
•
22
z-lab/Qwen3-8B-DFlash-b16
Text Generation
•
Updated
•
5.32k
•
19
z-lab/Qwen3-4B-Thinking-2507-PARO
1B
•
Updated
z-lab/DeepSeek-R1-Distill-Llama-8B-PARO
1B
•
Updated
•
4
z-lab/Meta-Llama-3-70B-PARO
20B
•
Updated
•
3
z-lab/Qwen3-14B-PARO
2B
•
Updated
z-lab/Qwen3-14B-Base-PARO
2B
•
Updated