Codes for our article, "A Pitfall of Learning from User-generated Dataset", on a type of class noise specific to user-generated datasets (e.g. customer reviews) called Subjective Class Issue. We used datasets provided generaously by Donorschoose.org, Yelp Review, and Amazon Fine Food. By following the usage below, you can replicate the results shown in our paper.
- python3
- jupyter notebook
If you would like to run our notebooks, please follow the steps below:
- Download "Project Essays" provided by Donorschoose.org
git clone
this repocd
into the repo- open python virtual environment
- run
pip install -r requirements.txt
- in the python virtual environment, open
jupyter notebook
- open ipynb files, where the name represents the tasks of each notebook (*make sure you train doc2vec before you run tsne plots)