Optimizing heavy models with early exit branches
Everyday models get heavier and heavier (in terms of learnable parameters). For example, LEMON_large has 200M parameters and GPT-3 has over 175 billion parameters!
Deep Learning, NLP, NMT, AI, ML
Everyday models get heavier and heavier (in terms of learnable parameters). For example, LEMON_large has 200M parameters and GPT-3 has over 175 billion parameters!