A Vision Transformer in ConvNet’s Clothing for Faster Inference

LeViT

This repository contains PyTorch evaluation code, training code and pretrained models for LeViT.

They obtain competitive tradeoffs in terms of speed / precision:
LeViT-1

For details see LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference by Benjamin Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou and Matthijs Douze.

If you use this code for a paper please cite:

@article{graham2021levit,
  title={LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference},
  author={Benjamin Graham and Alaaeldin El-Nouby and Hugo Touvron and Pierre Stock and Armand Joulin and Herv'e J'egou and Matthijs Douze},
  journal={arXiv preprint arXiv:22104.01136},
  year={2021}
}

We provide baseline LeViT models trained with distllation on ImageNet 2012.

First, clone the repository locally:

git clone https://github.com/facebookresearch/levit.git

Then, install PyTorch 1.7.0+ and torchvision

To finish reading, please visit source site