Efficient Vision Transformers with Dynamic Token Sparsification

DynamicViT

This repository contains PyTorch implementation for DynamicViT.

Created by Yongming Rao, Wenliang Zhao, Benlin Liu, Jiwen Lu, Jie Zhou, Cho-Jui Hsieh

intro

Model Zoo

We provide our DynamicViT models pretrained on ImageNet:

Usage

Requirements

torch>=1.7.0
torchvision>=0.8.1
timm==0.4.5

Data preparation: download and extract ImageNet images from http://image-net.org/. The directory structure should be

│ILSVRC2012/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Model preparation: download pre-trained DeiT and LV-ViT models for training DynamicViT:

sh download_pretrain.sh

Demo

We provide a Jupyter notebook where you can run the visualization

To finish reading, please visit source site