Searching for Efficient Multi-Stage Vision Transformers in Pytorch
This repository contains the official Pytorch implementation of “Searching for Efficient Multi-Stage Vision Transformers” and is based on DeiT and timm.
Illustration of the proposed multi-stage ViT-Res network.
Illustration of weight-sharing neural architecture search with multi-architectural sampling.
Accuracy-MACs trade-offs of the proposed ViT-ResNAS. Our networks achieves comparable results to previous work.
Requirements
The codebase is tested with 8 V100 (16GB) GPUs.
To install requirements:
pip install -r requirements.txt
Docker files are provided to set up the environment. Please run:
cd docker
sh 1_env_setup.sh
sh 2_build_docker_image.sh
sh 3_run_docker_image.sh
Make sure that the configuration specified in 3_run_docker_image.sh
is correct before running the command.
Data Preparation
Download and extract ImageNet train and