Skip to content

devanshu-bharti/NLP_Women_ECommerce_Clothing_Review_Sentiment_Analysis

Repository files navigation

NLP for Customer Insights: A Deep Dive into Women's Clothing Reviews

image


This project presents a comprehensive case study on Women's E-Commerce Clothing Reviews, focusing on two key areas: Sentiment Analysis and Predictive Modeling. The primary objective was to extract actionable insights from unstructured text data to understand customer sentiment and predict how a product might be rated based on the language of the review.

Sentiment Analysis

Conducted a comparative sentiment analysis using multiple popular NLP libraries:

  • spaCy: Leveraged for its powerful linguistic features and text processing capabilities to prepare the data for analysis.

  • TextBlob: Employed for its straightforward and accessible API to get polarity and subjectivity scores.

  • NLTK's VADER (Valence Aware Dictionary and sEntiment Reasoner): Utilized for its effectiveness on social media and short, informal text, which is common in reviews.


WordCloud Analysis

  • The Positive WordCloud visually confirmed that attributes like fit, comfort, color, and quality (using words like 'love', 'perfect', 'great', 'soft', 'beautiful') are key drivers of high ratings.

  • In stark contrast, the Negative WordCloud immediately highlighted significant customer pain points, with words like small, tight, cheap, return, and disappointed being most prominent.


Predictive Analysis

  1. Text Preprocessing & Feature Engineering: Cleaned and prepared the text data. Utilized TF-IDF Vectorizer to convert the corpus of reviews into a meaningful matrix of numerical features, giving weight to words that are more important to a specific review.

  2. Data Visualization: Generated WordClouds for different sentiment categories to visually identify the most frequent and prominent words associated with positive and negative feedback

  3. Model Building: Implemented a Logistic Regression model, a robust and interpretable classification algorithm, to predict the sentiment or star rating of a review. The model was trained on the TF-IDF features to learn the patterns in the language that correlate with customer satisfaction.


🛠️ Tech Stack

Python Pandas scikit-learn spaCy NLTK Jupyter Matplotlib

Core Technologies

  • Language: Python
  • Environment: Jupyter Notebook

Data Handling & Analysis

  • Pandas: Used for loading, cleaning, and structuring the e-commerce review dataset.

Natural Language Processing (NLP)

  • spaCy: Leveraged for efficient text preprocessing, including tokenization and lemmatization.
  • NLTK (VADER): Implemented for its powerful sentiment analysis on short, informal review texts.
  • TextBlob: Used for comparative sentiment scoring to validate results.

Machine Learning & Predictive Analysis

  • Scikit-learn: Employed for:
    • TfidfVectorizer: To convert text data into numerical features.
    • LogisticRegression: To build and train the predictive model.

Data Visualization

  • WordCloud: To create insightful visualizations of the most frequent positive and negative words.
  • Matplotlib & Seaborn: For generating plots and charts during exploratory data analysis (EDA).

That's all folks

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors