Machine Translation Weekly 71: Explaining Random Feature Attention
Transformers are the neural architecture that underlies most of the current state-of-the-art machine translation and natural language processing in general. One of its major drawbacks is the quadratic complexity of the underlying self-attention mechanism, which in practice limits the sequence length that could be processed by Transformers. There already exist some tricks to deal with that. One of them is local sensitive hashing that was used in the Reformer architecture (see MT Weekly 27). The main idea was computing the […]
Read more