Eureka: Evaluating and understanding progress in AI
In the fast-paced progress of AI, the question of how to evaluate and understand capabilities of state-of-the-art models is timelier than ever. New and capable models are being released frequently, and each release promises the next big leap in frontiers of intelligence. Yet, as researchers and developers, often we ask ourselves: Are these models all comparable, if not