Efficient Extractive Question Answering on CPU using QUIP
TLDR — Extractive question answering is an important task for providing a good user experience in many applications. The popular Retriever-Reader framework for QA using BERT can be difficult to scale as it requires the re-processing of candidate documents in the context of a question in real time. By using phrase embeddings, we can process question and context independently which drastically reduces runtime demands. On a limited experiment I found QUIP to be 4x faster than a comparable QA model on