Preliminary code for Representation learning with Generalized Similarity Functions

Code for GSF learning in offline Procgen.
Note: The repo is under construction right now, some experiments might still be changed/ added.
Since the dataset is very large due to operating on pixel observations, we provide a way to generate it from pre-trained PPO checkpoints instead of hosting 1Tb+ of data.
Instructions
- Clone the repo
- Either train a PPO agent from scratch on 200 levels (see here: here), or download provided PPO checkpoints (same repo link). TLDR, you can run
python train_ppo.py --env_name=bigfish
in the current repo to do so. - Run
python evaluate_ppo.py --dataset_dir
.--shards --timesteps --obs_type rgb --model_dir=
This will generateobs_X.npy, action_X.npy, reward_X.npy, done_X.npy
arrays, whereX
goes from 1 ton_shards
. - You can then work on these NumPy arrays in the classical