Amazon Review Analysis

Overview

Python project analysing a large Amazon reviews dataset to understand customer sentiment and the phrases that drive it.

Data was cleaned, pre-processed, then tokenised before performing sentiment analysis. Sentiment skews and distributions were identified through visualisations.

Stop-words and non-string inputs are filtered out before extracting high-signal bigrams using pointwise mutual information (PMI) & domain phrases linked to sentiment classifications were extracted using co-occurrence counts. Extractions were utilised to identify top product-specific phrases linking to different sentiments. Optional POS-tagging was additionally implemented to produce more meaningful extractions.

Due to the size of the dataset, it is not included in this repository, however, you can download it from Kaggle here

Features

Preprocessing: Cleaning and preparing the dataset for analysis.
Sentiment Analysis: Classifying reviews as positive, negative, or neutral.
Collocation Extraction: Identifying frequently co-occurring words using PMI and co-occurrence methods.
Visualizations: Generating insightful visualizations to represent the analysis results.

Project Structure

Key Files

Amazon_Review_Analysis.py: The main script for performing analysis on the dataset.
Reviews.csv: The original dataset containing Amazon product reviews.
sampled_reviews.csv: A sampled subset of the dataset for testing and development purposes.
README.md: Documentation for the project.

Branches Overview

preprocessing (merged): Reformatting and cleaning the dataset in preparation for analysis.
sentiment-analysis (merged): Implementing TextBlob scoring and sentiment labelling.
performance-optimisation (merged): Introducing dataset sampling to improve compile time.
collocation-extraction-pmi (merged): Extracting collocations using the Pointwise Mutual Information (PMI) approach.
collocation-extraction-co-occurrence (merged): Extracting collocations using the Co-Occurrence approach.
sentiment-visualisations (merged): Adding visualisations to display sentiment patters.
visualisation-improvements (merged): Improving the visualisation of collocation extraction results.

Setup Instructions

Prerequisites

Python 3.8 or higher
Required Python libraries:
- pandas
- numpy
- matplotlib
- seaborn
- nltk
- textblob

Installation

Clone the repository:

git clone https://github.com/your-username/Amazon-Review-Analysis.git

Navigate to the project directory:
```
cd Amazon_Review_Analysis
```

Download necessary NLTK resources:

import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('averaged_perceptron_tagger')

Usage

Download the dataset from Kaggle and place it in the project directory.
Rename the CSV file to Reviews.csv
Run the program
Customise the collocation extraction filtering options to your needs (refer to inline comments).

Contributing

Contributions are welcome!
Please fork the repository and submit a pull request with your changes.

Contact

For any questions or feedback, feel free to reach out:

Email: [email protected]
GitHub: Mattytomo365

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
Amazon_Review_Analysis.py		Amazon_Review_Analysis.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Amazon Review Analysis

Overview

Features

Project Structure

Key Files

Branches Overview

Setup Instructions

Prerequisites

Installation

Usage

Contributing

Contact

About

Uh oh!

Releases

Packages

Languages

Mattytomo365/Amazon-Review-Analysis

Folders and files

Latest commit

History

Repository files navigation

Amazon Review Analysis

Overview

Features

Project Structure

Key Files

Branches Overview

Setup Instructions

Prerequisites

Installation

Usage

Contributing

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages