The Rich Get Richer: Disparate Impact of Semi-Supervised Learning

Preprocess file of the dataset used in implicit sub-populations:
(Demographic groups: race and gender)
The following code will pre-process the jigsaw dataset and return train/test dataset files including demographic groups information.
Step-1:
Download the jigsaw dataset: identity_individual_annotations.csv from
https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data.
Step-2:
python preprocecss_jiasaw_toxicity_gender_and_race_balanced.py
Implementation of SSL methods
Please follow the official implementations of MixMatch, MixText, and UDA.
[1] https://github.com/google-research/mixmatch
[2] https://github.com/GT-SALT/MixText
[3] https://github.com/google-research/uda
GitHub – UCSC-REAL/Disparate-SSL at pythonawesome.com
Contribute to UCSC-REAL/Disparate-SSL development by creating an account on GitHub.