An Efficient Model Parallelization Toolkit for Deployment
parallelformers
An Efficient Model Parallelization Toolkit for Deployment.
- Parallelformers, which is based on Megatron LM, is designed to make model parallelization easier.
- You can parallelize various models in HuggingFace Transformers on multiple GPUs with a single line of code.
- Currently, Parallelformers only supports inference. Training features are NOT included.
Why Parallelformers?
You can load a model that is too large for a single GPU. For example, using Parallelformers, you can load a model of 12GB on two 8 GB GPUs. In addition, you can save your precious money because usually multiple smaller size GPUs are less costly than a single larger size GPU.
Installation
Parallelformers can be easily installed using the pip
package manager. All the dependencies such as torch, transformers, and dacite should