DeepSpeed: Advancing MoE inference and training to power next-generation AI scale
In the last three years, the largest trained dense models have increased in size by over 1,000 times, from a few hundred million parameters to over 500 billion parameters in