Skip to content

MaedehArab/Sentiment-analysis_comments-of-Digikala-users

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“ฑ Samsung Mobile Reviews โ€“ Sentiment Analysis Using Digikala API | Persian Text Processing | Machine Learning

โญ About the Project

This repository contains a complete end-to-end sentiment analysis pipeline on Persian user reviews of Samsung mobile phones, collected directly from the Digikala API.

The project includes:

  • Automated data collection from Digikala API (products + comments)
  • Full data cleaning & preprocessing for Persian text
  • TF-IDF vectorization optimized for short Persian reviews
  • Handling imbalanced classes using targeted oversampling
  • Training multiple ML models (Naive Bayes, Logistic Regression, Linear SVM)
  • Evaluation using Accuracy, Precision, Recall, F1-score, and Confusion Matrix

๐Ÿ“Š Dataset Overview

The dataset contains user reviews for Samsung mobile phones:

Column Description
body Original user comment
sentiment Label assigned based on Digikala rating
comment Preprocessed Persian text
label Encoded label
(+ extracted product features from API in raw files)

๐Ÿง  Sentiment Classes

Reviews were categorized into:

  • Positive
  • Neutral
  • Negative

Based on the numeric rating provided by users on Digikala.

๐Ÿ›  Machine Learning Models Used

We trained and compared:

  • Naive Bayes (Best Performance)
  • Logistic Regression
  • Linear SVM

Best model selected based on weighted F1-score and overall metrics.

๐Ÿ“ˆ Evaluation Metrics

All models were evaluated using:

  • Accuracy
  • Precision
  • Recall
  • F1-score
  • Confusion Matrix

Final results (example):

Model Accuracy F1-score
Naive Bayes 0.78 0.71
Logistic Regression 0.69 0.70
Linear SVM 0.73 0.71

๐Ÿš€ How to Run the Project

  1. Install dependencies: pip install -r requirements.txt
  2. Run data collection (optional): python fetch_data.py
  3. Open the notebook: jupyter notebook sentiment_analysis.ipynb

โš™ Preprocessing Techniques

  • Persian text normalization
  • Removing non-Persian characters
  • Tokenization
  • Stopword removal
  • No stemming (better performance for short reviews)
  • TF-IDF with n-grams (1,2)

About

Supervised ML task - Sentiment analysis_comments of Digikala users (Samsung_mobiles)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors