Ecommerce Product Categorization

This repository contains the solution to the Ecommerce Product Categorization challenge. The project focuses on developing an efficient text classification model to categorize eCommerce products based on their descriptions.

Resume Highlights (XYZ Storytelling)

Improved eCommerce product categorization accuracy to 98.44% (F1 Score: 98.43%) by developing and fine-tuning deep learning models (Sequential Model, Random Classifier) on real-world product data.
Enhanced class balance and model robustness by implementing stratified sampling and SMOTE oversampling, resulting in high precision and recall across all product categories.
Automated text preprocessing pipeline using NLTK, TF-IDF, and Word2Vec, enabling scalable and efficient feature extraction from raw product descriptions.
Identified key business insights (e.g., top-selling categories, price correlations) through exploratory data analysis, supporting data-driven decision-making for eCommerce strategy.
Reduced search friction and improved customer experience by deploying a high-accuracy categorization model, increasing product discoverability and reducing abandoned sessions.

Problem Statement

In the fast-paced eCommerce industry, accurate product categorization is critical for improving customer experience, reducing search friction, and enhancing product discoverability. The challenge lies in handling ambiguities, unconventional naming conventions, and multi-language data. This project addresses these challenges with an innovative and scalable solution.

Approach

Data Handling:
- Loaded and explored the dataset for valuable insights.
- Addressed class imbalance using stratified sampling and SMOTE oversampling.
Text Preprocessing:
- Used NLTK for natural text preprocessing.
- Converted textual descriptions to numerical features using techniques like TF-IDF and Word2Vec.
Model Development:
- Performed train-test split using stratified sampling.
- Trained and evaluated various deep learning models:
  - Sequential models
  - Random classifiers
- Fine-tuned the models using hyperparameter optimization to improve performance.
Evaluation:
- Compared models based on their train and test accuracy.
- Selected Sequential Model as the best-performing model with a test accuracy of 98%.

Key Insights

Price Correlation: Higher retail prices correspond to higher discount prices.
Top Sellers: Clothing and Jewelry are the highest-selling categories.
Low Sellers: Bags, Wallets & Belts, and Baby Care show the lowest sales.
Jewelry Pricing: Jewelry has the highest average retail and discounted price.

Results

Model	Train Accuracy	Test Accuracy
Sequential Model	97.65%	98.08%
Random Classifier	99.99%	96.04%

Combined Model Performance

The combined model (Sequential Model and Random Classifier) achieved:
- Test Accuracy: 98.44%
- F1 Score: 98.43%
The classification report highlights:
- High precision and recall across most classes.
- Weighted average F1-score of 98%, demonstrating balanced performance.
The ensemble model effectively handles class imbalances while maintaining high predictive accuracy.

Benefits of Accurate Categorization

Improved Customer Experience: Enhanced browsing and navigation for seamless product discovery.
Increased Product Discoverability: Broader exposure to relevant products, boosting sales and satisfaction.
Reduced Search Friction: Faster and frustration-free searches, reducing abandoned sessions.

Conclusion and Future Work

The accurate categorization model developed in this project is a crucial tool for improving eCommerce platforms. Future enhancements may include:

Exploring advanced deep learning models.
Integrating visual features, such as product images, for better accuracy.

You can now review and save the README file in your repository.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Dataset/Dataset		Dataset/Dataset
data		data
.gitignore		.gitignore
E-commerce-Product-Categorization-Using-ML-and-Deep-Learning.pdf		E-commerce-Product-Categorization-Using-ML-and-Deep-Learning.pdf
E-commerce-Product-Categorization-Using-ML-and-Deep-Learning.pptx		E-commerce-Product-Categorization-Using-ML-and-Deep-Learning.pptx
LICENSE		LICENSE
README.md		README.md
newdataset2.ipynb		newdataset2.ipynb
video_summary_link		video_summary_link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ecommerce Product Categorization

Resume Highlights (XYZ Storytelling)

Problem Statement

Approach

Key Insights

Results

Combined Model Performance

Benefits of Accurate Categorization

Conclusion and Future Work

About

Uh oh!

Releases

Packages

Languages

License

realranjan/Flipkart_product_categorization

Folders and files

Latest commit

History

Repository files navigation

Ecommerce Product Categorization

Resume Highlights (XYZ Storytelling)

Problem Statement

Approach

Key Insights

Results

Combined Model Performance

Benefits of Accurate Categorization

Conclusion and Future Work

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages