Post
2689
Thrilled to unveil DS-MoE: a dense training and sparse inference scheme for enhanced computational and memory efficiency in your MoE models! πππ
Discover more in our blog: https://huggingface.co/blog/bpan/ds-moe and dive into the details with our paper: Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models (2404.05567)
Discover more in our blog: https://huggingface.co/blog/bpan/ds-moe and dive into the details with our paper: Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models (2404.05567)