A PyTorch library for decentralized deep learning across the Internet
Hivemind: decentralized deep learning in PyTorch
Hivemind is a PyTorch library for decentralized deep learning across the Internet. Its intended usage is training one large model on hundreds of computers from different universities, companies, and volunteers.
Key Features
- Distributed training without a master node: Distributed Hash Table allows connecting computers in a decentralized
network. - Fault-tolerant backpropagation: forward and backward passes succeed even if some nodes are unresponsive or take too
long to respond. - Decentralized parameter averaging: iteratively aggregate updates from multiple workers without the need to
synchronize across the entire network (paper). - Train neural networks of arbitrary size: parts of their layers are distributed across the participants with the
Decentralized Mixture-of-Experts (paper).
To learn more about the ideas behind this library, see https://learning-at-home.github.io or read
the NeurIPS 2020 paper.
Installation
Before installing, make sure that your environment