Enhancing Unsupervised Video Representation Learning

DSM

The source code for paper Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion

1. Introduction (scene-dominated to motion-dominated)

ppl

Video datasets are usually scene-dominated, We propose to decouple the scene and the motion (DSM) with two simple operations, so that the model attention towards the motion information is better paid.

The generated triplet is as below:
triplet_visualization

What DSM learned?

With DSM pretrain, the model learn to focus on motion region (Not necessarily actor) powerful without one label available.

where_to_look

2. Installation

Dataset

Please refer dataset.md for details.

Requirements

Python3
pytorch1.1+
PIL
Intel (on the fly decode)

3. Structure

datasets
- list
  - hmdb51: the train/val lists of HMDB51
  - ucf101: the train/val
    
    To finish reading, please visit source site