How to Prepare News Articles for Text Summarization
Last Updated on August 7, 2019
Text summarization is the task of creating a short, accurate, and fluent summary of an article.
A popular and free dataset for use in text summarization experiments with deep learning methods is the CNN News story dataset.
In this tutorial, you will discover how to prepare the CNN News Dataset for text summarization.
After completing this tutorial, you will know:
- About the CNN News dataset and how to download the story data to your workstation.
- How to load the dataset and split each article into story text and highlights.
- How to clean the dataset ready for modeling and save the cleaned data to file for later use.
Kick-start your project with my new book Deep Learning for Natural Language Processing, including step-by-step tutorials and the Python source code files for all examples.
Let’s get started.
Tutorial Overview
This tutorial is divided into 5 parts; they are: