A PyTorch library for decentralized deep learning across the Internet

Hivemind: decentralized deep learning in PyTorch

Hivemind is a PyTorch library for decentralized deep learning across the Internet. Its intended usage is training one large model on hundreds of computers from different universities, companies, and volunteers.

Key Features

Distributed training without a master node: Distributed Hash Table allows connecting computers in a decentralized
network.
Fault-tolerant backpropagation: forward and backward passes succeed even if some nodes are unresponsive or take too
long to respond.
Decentralized parameter averaging: iteratively aggregate updates from multiple workers without the need to
synchronize across the entire network (paper).
Train neural networks of arbitrary size: parts of their layers are distributed across the participants with the
Decentralized Mixture-of-Experts (paper).

To learn more about the ideas behind this library, see https://learning-at-home.github.io or read
the NeurIPS 2020 paper.

Installation

Before installing, make sure that your environment

To finish reading, please visit source site