A pandas DataFrame processing CLI tool
![](https://www.deeplearningdaily.com/wp-content/uploads/2021/06/a-pandas-dataframe-processing-cli-tool_60db9ad61a560-375x210.png)
PdpCLI
PdpCLI is a pandas DataFrame processing CLI tool which enables you to build a pandas pipeline powered by pdpipe from a configuration file. You can also extend pipeline stages and data readers / writers by using your own python scripts.
Features
- Process pandas DataFrame from CLI without wrting Python scripts
- Support multiple configuration file formats: YAML, JSON, Jsonnet
- Read / write data files in the following formats: CSV, TSV, JSON, JSONL, pickled DataFrame
- Import / export data with multiple protocols: S3 / Databse (MySQL, Postgres, SQLite, …) / HTTP(S)
- Extensible pipeline and data readers / writers
Installation
Installing the library is simple using pip.
$ pip install "pdpcli[all]"
Tutorial
Basic Usage
-
Write a pipeline config file