The Reformer – Pushing the limits of language modeling
How the Reformer uses less than 8GB of RAM to train on sequences of half a million tokens The Reformer model as introduced by Kitaev, Kaiser et al. (2020) is one of the most memory-efficient transformer models for long sequence modeling as of today. Recently, long sequence modeling has experienced a surge of interest as can be seen by the many submissions from this year alone – Beltagy et al. (2020), Roy et al. (2020), Tay et al., Wang et […]
Read more