The NLP Cypher | 12.12.21
Here’s a collection of papers by your favorite big tech and educational institutions.
“The Generalist Language Model (GLaM), a trillion weight model that can be trained and served efficiently (in terms of computation and energy use) thanks to sparsity, and achieves competitive performance on multiple few-shot learning tasks. GLaM’s performance compares favorably to a dense language model, GPT-3 (175B) with significantly improved learning efficiency across 29 public NLP benchmarks in seven categories, spanning language completion, open-domain question answering, and natural language inference tasks.”
Glam vs. GPT-3 on NLG and NLU Tasks
Awesome Take Away:
This large sparse model is competitive with dense counterparts while training on much less data and consuming less energy.
An OCR demo with LayoutLM fine-tuned for information extraction on receipts data.
http://www.pwc.com/us/en/tech-effect/ai-analytics/ai-predictions.html
“I procrastinated a deep dive into transformers for a few years. Finally the discomfort of not knowing what makes them tick grew too great for me. Here is that …”
https://e2eml.school/transformers.html
“When trying to predict how PyTorch would itself get disrupted, we used to joke a bit about the next version of PyTorch being written in Julia. This was not very serious: a huge factor in moving PyTorch from Lua to Python was to tap into Python’s immense ecosystem (an ecosystem that shows no signs of going away) and even today it is still hard to imagine how a new language can overcome the network effects of Python.”
One of the most intuitive tutorials out there.