An implementation of Performer, a linear attention-based transformer in Pytorch

Performer – Pytorch

An implementation of Performer, a linear attention-based transformer variant with a Fast Attention Via positive Orthogonal Random features approach (FAVOR+).

Install

$ pip install performer-pytorch

Then you must run the following, if you plan on training an autoregressive model

$ pip install -r requirements.txt

Usage

Performer Language Model

import torch
from performer_pytorch import PerformerLM

model = PerformerLM(
num_tokens = 20000,
max_seq_len = 2048, # max sequence length
dim = 512, # dimension
depth = 12, # layers
heads = 8, # heads
causal = False, # auto-regressive or not
nb_features = 256, # number of random features, if not set, will default to (d * log(d)), where d is the dimension of each head
feature_redraw_interval = 1000, # how

 

 

 

To finish reading, please visit source site