Revitalize Region Feature for Democratizing Video-Language Pre-training

Revitalize Region Feature for Democratizing Video-Language Pre-training

Guanyu Cai, Yixiao Ge, Alex Jinpeng Wang, Rui Yan, Xudong Lin, Ying Shan, Lianghua He, Xiaohu Qie, Jianping Wu, Mike Zheng Shou [Arxiv]

Pytorch implementation of our method for video-language pre-training.

avatar

Requirement

conda create -n demovlp python=3.8
source activate demovlp 
pip install -r requirements

Pre-trained weights

Model Dataset Download
DemoVLP WebVid+CC3M Model
DemoVLP WebVid+CC3M+CC7M Model

Data

Download Pre-trained model

mkdir pretrained
cd pretrained
wget -c https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_base_p16_224-80ecf9dd.pth