A pandas DataFrame processing CLI tool
PdpCLI
PdpCLI is a pandas DataFrame processing CLI tool which enables you to build a pandas pipeline powered by pdpipe from a configuration file. You can also extend pipeline stages and data readers / writers by using your own python scripts.
Features
- Process pandas DataFrame from CLI without wrting Python scripts
- Support multiple configuration file formats: YAML, JSON, Jsonnet
- Read / write data files in the following formats: CSV, TSV, JSON, JSONL, pickled DataFrame
- Import / export data with multiple protocols: S3 / Databse (MySQL, Postgres, SQLite, …) / HTTP(S)
- Extensible pipeline and data readers / writers
Installation
Installing the library is simple using pip.
$ pip install "pdpcli[all]"
Tutorial
Basic Usage
-
Write a pipeline config file