Is Space-Time Attention All You Need for Video Understanding?
This is an official pytorch implementation of our ICML 2021 paper Is Space-Time Attention All You Need for Video Understanding?. In this repository, we provide PyTorch code for training and testing our proposed TimeSformer model. TimeSformer provides an efficient video classification framework that achieves state-of-the-art results on several video action recognition benchmarks such as Kinetics-400.
If you find TimeSformer useful in your research, please use the following BibTeX entry for citation.
@inproceedings{gberta_2021_ICML,
author = {Gedas Bertasius