A tool for monitoring and managing computing resources across multiple hosts
TensorHive
TensorHive is an open source tool for monitoring and managing computing resources across multiple hosts. It solves the most common problems and nightmares about accessing and sharing your AI-oriented infrastructure across multiple, often competing users.
It’s designed with simplicity, flexibility and configuration-friendliness in mind.
Main features:
GPU Reservation calendar
Each column represents all reservation events for a GPU on a given day. In order to make a new reservation simply click and drag with your mouse, select GPU(s), add some meaningful title, optionally adjust time range.
If there are many hosts and GPUs in our infrastructure, you can use our simplified, horizontal calendar to quickly identify empty time slots and filter out already reserved GPUs. image
From now on, only your processes are eligible to run on reserved