CLOOB Conditioned Latent Diffusion training and inference code
Introduction
This repository contains the training code for CLOOB conditioned latent diffusion.
CCLD is similar in approach to the CLIP conditioned diffusion trained by
Katherine Crowson with a few
key differences:
-
The use of latent diffusion cuts training costs by something like a factor of
ten, allowing a high quality 1.2 billion parameter model to converge in as few as 5 days
on a single 8x A100 pod. -
CLOOB conditioning can take advantage of CLOOB’s unified latent space. CLOOB
text and image embeds on the same inputs share a high similarity of somewhere around 0.9. This
makes it possible to train the model without captions by using image embeds in the
training loop and text embeds during inference.
This combination of