[1] Chu, Sanghyeok, et al. "Enhancing Mixture-of-Experts Specialization via Cluster-Aware Upcycling." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2026.
[2] Komatsuzaki, Aran, et al. "Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints." The Eleventh International Conference on Learning Representations.