A data science and big data project regarding classifying sentiment of positive and negative reviews from Amazon. The project involved natural language processing where a deep learning model of a Convolutional Neural Network (CNN) was compared to a traditional machine learning model Naive Bayes. Apache Spark was used in standalone mode with a master and worker executing jobs, with exploring and pre-processing the data with PySpark, building the Naive Bayes model with Spark MLlib, and using the Keras API within TensorFlow for building the CNN model. The dataset that was used was large and had 3.6 million of records.
dsxyash/Amazon_Reviews
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|