The NLP Cypher | 07.11.21
Welcome back! Hope you had a great week. We have a new leader on the SuperGLUE benchmark with a new Ernie model from Baidu comprising of 10 billion parameters trained on on a 4TB corpus. FYI, human baseline was already beat by Microsoft’s DeBERTa model at the beginning of the year… time for a new SuperSuperGLUE benchmark???
BTW, if you are still interested in GitHub’s CoPilot, I stumbled upon the Codex paper this week:
DeepMind’s Perceiver transformer allows it to take a variety of modalities (vision, audio, text) as its input and able to achieve competitive outcomes in benchmark performance. Usually a model architecture is specialized to a specific domain, however what the Perceiver is attempting to do here is being able to generalize to any domain using a single architecture. 😎
Adding to the list of efficient transformers, comes the LS-Transformer that be both used for autoregressive and bi-directional models and for both language and vision domains. Model obtains SOTA results on the Long Range Arena, char-level language modeling and ImageNet classification.
Paper:
170 video lectures from Sebastian Raschka in 2021 using PyTorch.
Table of Contents
Jupyter notebooks implementing the code samples found in the book Deep Learning with Python, 2nd Edition.
A conceptual intro to model parallelism touching on several techniques highlighted below. HF also highlights which of the techniques are currently implemented in their library.
- DataParallel (DP) — the same setup is replicated multiple times, and each being fed a slice of the data. The processing is done in parallel and all setups are synchronized at the end of each training step.
-