MMNas: Deep Multimodal Neural Architecture Search
MMNas
MMNas: Deep Multimodal Neural Architecture Search
This repository corresponds to the PyTorch implementation of the MMnas for visual question answering (VQA), visual grounding (VGD), and image-text matching (ITM) tasks.
Prerequisites
Software and Hardware Requirements
You may need a machine with at least 4 GPU (>= 8GB), 50GB memory for VQA and VGD and 150GB for ITM and 50GB free disk space. We strongly recommend to use a SSD drive to guarantee high-speed I/O.
You should first install some necessary packages.