One Model to Train them All: Hierarchical Self-Distillation for Enhanced Early Layer Embeddings Paper • 2503.03008 • Published Mar 4, 2025 • 1
Understanding Self-Distillation in the Presence of Label Noise Paper • 2301.13304 • Published Jan 30, 2023
How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks Paper • 2407.03475 • Published Jul 3, 2024
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning Paper • 2305.10005 • Published May 17, 2023 • 3