An Efficient Model Parallelization Toolkit for Deployment

parallelformers

An Efficient Model Parallelization Toolkit for Deployment.

Parallelformers, which is based on Megatron LM, is designed to make model parallelization easier.
You can parallelize various models in HuggingFace Transformers on multiple GPUs with a single line of code.
Currently, Parallelformers only supports inference. Training features are NOT included.

Why Parallelformers?

You can load a model that is too large for a single GPU. For example, using Parallelformers, you can load a model of 12GB on two 8 GB GPUs. In addition, you can save your precious money because usually multiple smaller size GPUs are less costly than a single larger size GPU.

Installation

Parallelformers can be easily installed using the pip package manager. All the dependencies such as torch, transformers, and dacite should

To finish reading, please visit source site