The NLP Cypher | 10.31.21
The Localization Problem (LP) is a glaring dark cloud hanging over the state of affairs in applied deep learning. And acknowledging this problem, I believe, will enable us to make better use of applied AI and expand our knowledge in how the business market will form.
Defining LP: There is a limit to how much large centralized language models can generalize at scale given: 1) that different users inherently have varying definitions of ground-truths due to inter-dependencies to their unique real-world environment and 2) depending whether or not model performance is mission-critical. In other words, in certain conditions, in order for a model to be optimized for accuracy for a given user, the model needs to be “localized” to its user’s ground truth in their data assuming that a model can’t afford to be wrong too many times.
Example: Imagine there is a kazillion parameter encoder transformer called Hal9000. This AGI model knows everything there is to know in the world when it comes to knowledge. Now Hal9000 has 2 big customers, John that works for Meta and Jane that works for CyberDyne Systems. John and Jane, don’t know each other, but are active commodity traders in their spare time who depend on Hal9000 for classifying finance-related tweets for the sentiment analysis (positive, negative, neutral) task. John and Jane are trading in real-time and a tweet is published on the wire: “gold is up 150% in after-hours trading.”
Some background: John is bullish on gold (owns gold call options and wants the gold price to go up in order to make money) and Jane is bearish on gold (owns gold put options and wants the price to go down in order to make money).
It’s time for Hal9000 to do its magic and classify this tweet so John and Jane can execute a trade. But Hal has a big problem. It can’t generalize to both John and Jane’s definition of ‘positive’ and ‘negative’. The model needs to classify this tweet as ‘positive’ for John and ‘negative’ for Jane given the same input text.
This is the LP manifesting itself in the real-world. Hal needs to localize itself to John’s ground truth and Jane’s ground truth of sentiment. Currently the way we localize models is by fine-tuning them. And fine-tuning isn’t a hinderance (as some may suggest who are obsessed with zero-shot) on AI performance but in actuality, it’s a prerequisite. All the software and hardware improvements in the world can’t make the model improve its accuracy if it is not localized to its user. However, not all use-cases encounters LP.
There is a market for non-local language models to thrive: and it’s a market where users can leverage a community accepted NLP task for which the error of the model is not mission-critical. This type of task and non-mission-critical environment isn’t concerned with LP.