A Gentle Introduction to Calculating the BLEU Score for Text in Python
Last Updated on December 19, 2019
BLEU, or the Bilingual Evaluation Understudy, is a score for comparing a candidate translation of text to one or more reference translations.
Although developed for translation, it can be used to evaluate text generated for a suite of natural language processing tasks.
In this tutorial, you will discover the BLEU score for evaluating and scoring candidate text using the NLTK library in Python.
After completing this tutorial, you will know:
- A gentle introduction to the BLEU score and an intuition for what is being calculated.
- How you can calculate BLEU scores in Python using the NLTK library for sentences and documents.
- How you can use a suite of small examples to develop an intuition for how differences between a candidate and reference text impact the final BLEU score.
Kick-start your project with my new book Deep Learning for Natural Language Processing, including step-by-step tutorials and the Python source code files for all examples.
Let’s get started.
- May/2019: Updated to reflect changes to the API in NLTK 3.4.1+.