A novel attention-based architecture for vision-and-language navigation
Episodic Transformers (E.T.) Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions. This code reproduces the results obtained with E.T. on ALFRED benchmark. To learn more about the benchmark and the original code, please refer to ALFRED repository. Quickstart Clone repo: $ git clone https://github.com/alexpashevich/E.T..git ET $ export ET_ROOT=$(pwd)/ET $ export ET_LOGS=$ET_ROOT/logs $ export ET_DATA=$ET_ROOT/data $ export […]
Read more