A Web scraping library and command-line tool for text discovery and extraction
![](https://www.deeplearningdaily.com/wp-content/uploads/2021/09/a-web-scraping-library-and-command-line-tool-for-text-discovery-and-extraction_613fcc1b7a4d6-375x210.jpeg)
Description
Trafilatura is a Python package and command-line tool which seamlessly downloads, parses, and scrapes web page data: it can extract metadata, main body text and comments while preserving parts of the text formatting and page structure. The output can be converted to different formats.
Distinguishing between a whole page and the page’s essential parts can help to alleviate many quality problems related to web text processing, by dealing with