Feedback Transformer and Expire-Span with python
This repo contains the code for two papers:
- Feedback Transformer
- Expire-Span
The training code is structured for long sequential modeling with Transformer-like architectures.
Requirements
You will need a CUDA-enabled GPU to run the code.
Setup
Run the following:
pip install -r requirements.txt
Feedback Transformer
Introduced in Addressing Some Limitations of Transformers with Feedback Memory.
Running Experiments from the Paper
enwik8
Model | Params | Valid | Test |
---|---|---|---|
Feedback Transformer | 77M | 0.984 | 0.962 |
Numbers are Bits-Per-Character
bash experiments/feedback/enwik8.sh
Algorithmic
Model | 3 Variable | 5 Variable |
---|---|---|
Transformer | 33.7 | 37.5 |
Feedback Transformer | 99.1 | 92.6 |
Numbers are % Accuracy on Test
bash experiments/feedback/algorithmic_3var.sh
bash experiments/feedback/algorithmic_5var.sh
Expire-Span
Introduced in Not All Memories are Created Equal: Learning to Expire.
Running Experiments from the