PLStream: A Framework for Fast Polarity Labelling of Massive Data Streams

Motivation
- When dataset freshness is critical, the annotating of high speed unlabelled data streams becomes critical but remains an open problem.
- We propose PLStream, a novel Apache Flink-based framework for fast polarity labelling of massive data streams, like Twitter tweets or online product reviews.
Environment Requirements
relative python packages are summerized in requirements.txt
- Flink v1.13
- Python 3.7
- Java 8
DataSource
Tweets
Yelp Reviews
Amazon Reviews
Quick Start
quick try PLStream on yelp review dataset
Data Prepare
cd PLStream
weget https://s3.amazonaws.com/fast-ai-nlp/yelp_review_polarity_csv.tgz
tar zxvf yelp_review_polarity_csv.tgz
mv yelp_review_polarity_csv/train.csv train.csv
1. Install required environment of PLStream
- please make sure Environment Requirements mentioned above is ready.
pip install