Efficient Vision Transformers with Dynamic Token Sparsification
DynamicViT
This repository contains PyTorch implementation for DynamicViT.
Created by Yongming Rao, Wenliang Zhao, Benlin Liu, Jiwen Lu, Jie Zhou, Cho-Jui Hsieh
Model Zoo
We provide our DynamicViT models pretrained on ImageNet:
Usage
Requirements
- torch>=1.7.0
- torchvision>=0.8.1
- timm==0.4.5
Data preparation: download and extract ImageNet images from http://image-net.org/. The directory structure should be
│ILSVRC2012/
├──train/
│ ├── n01440764
│ │ ├── n01440764_10026.JPEG
│ │ ├── n01440764_10027.JPEG
│ │ ├── ......
│ ├── ......
├──val/
│ ├── n01440764
│ │ ├── ILSVRC2012_val_00000293.JPEG
│ │ ├── ILSVRC2012_val_00002138.JPEG
│ │ ├── ......
│ ├── ......
Model preparation: download pre-trained DeiT and LV-ViT models for training DynamicViT:
sh download_pretrain.sh
Demo
We provide a Jupyter notebook where you can run the visualization