Continual pre-training of MoEs: how robust is your router?

A systematic study of Mixture of Experts (MoE) continual pre-training.


Latest publications