The NLP Cypher | 10.03.21
RAFT is a few-shot classification benchmark that tests language models:
– across multiple domains (lit reviews, medical data, tweets, customer interaction, etc.)
– on economically valuable classification tasks (someone inherently cares about the task)
– with evaluation that mirrors deployment (50 labeled examples per task, info retrieval allowed, hidden test set)