A webmining CLI tool & library for python
minet
minet is a webmining command line tool & library for python (>= 3.6) that can be used to collect and extract data from a large variety of web sources such as raw webpages, Facebook, CrowdTangle, YouTube, Twitter, Media Cloud etc.
It adopts a very simple approach to various webmining problems by letting you perform a variety of actions from the comfort of the command line. No database needed: raw CSV files should be sufficient to do most of the work.
In addition, minet also exposes its high-level programmatic interface as a python library so you can tweak its behavior at will.
What it does
Minet can single-handedly:
- Extract URLs from a text file (or a table)
- Parse URLs (get useful information, with Facebook- and Youtube-specific stuff)
- Join