Searching for Efficient Multi-Stage Vision Transformers in Pytorch

This repository contains the official Pytorch implementation of “Searching for Efficient Multi-Stage Vision Transformers” and is based on DeiT and timm.

Illustration of the proposed multi-stage ViT-Res network.

Illustration of weight-sharing neural architecture search with multi-architectural sampling.

Accuracy-MACs trade-offs of the proposed ViT-ResNAS. Our networks achieves comparable results to previous work.

Requirements

The codebase is tested with 8 V100 (16GB) GPUs.

To install requirements:

    pip install -r requirements.txt

Docker files are provided to set up the environment. Please run:

    cd docker

    sh 1_env_setup.sh
    
    sh 2_build_docker_image.sh
    
    sh 3_run_docker_image.sh

Make sure that the configuration specified in 3_run_docker_image.sh is correct before running the command.

Data Preparation

Download and extract ImageNet train and

To finish reading, please visit source site