A Vision Transformer in ConvNet’s Clothing for Faster Inference
LeViT
This repository contains PyTorch evaluation code, training code and pretrained models for LeViT.
They obtain competitive tradeoffs in terms of speed / precision:
For details see LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference by Benjamin Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou and Matthijs Douze.
If you use this code for a paper please cite:
@article{graham2021levit,
title={LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference},
author={Benjamin Graham and Alaaeldin El-Nouby and Hugo Touvron and Pierre Stock and Armand Joulin and Herv'e J'egou and Matthijs Douze},
journal={arXiv preprint arXiv:22104.01136},
year={2021}
}
We provide baseline LeViT models trained with distllation on ImageNet 2012.
First, clone the repository locally:
git clone https://github.com/facebookresearch/levit.git
Then, install PyTorch 1.7.0+ and torchvision