Skip to content

RohanDebnath24/Retail_Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Retail_Classification

This project focuses on classifying retail product categories based on textual descriptions. The workflow involves text preprocessing using spaCy, converting text into vectors using scikit-learn, and employing scikit-learn's classification algorithms for training and evaluation.

Table of Contents:

1.Introduction 2.Installation 3.Usage 4.Workflow 5.File Structure 6.Results 7.Contributing 8.License

Introduction:

This project aims to categorize retail products into specific categories based on their textual descriptions. It employs natural language processing (NLP) techniques to preprocess text data and scikit-learn's machine learning algorithms for classification.

Installation:

To run this project, ensure you have Python 3 installed along with the following libraries: 1.spaCy 2.scikit-learn 3.pandas 4.numpy You can install the required libraries using pip: pip install spacy scikit-learn pandas numpy Additionally, download the spaCy English model: python -m spacy download en_core_web_sm

Usage:

1.Clone this repository to your local machine: git clone https://github.com/your-username/retail-category-classification.git 2.Navigate to the project directory: cd retail-category-classification 3.Prepare your dataset in CSV format, where one column contains the textual descriptions, and another column contains corresponding category labels. 4.Modify the config.py file to specify the file paths, input columns, etc. 5.Run the main.py script: python main.py

Workflow:

1.Data Preprocessing using spaCy: The textual data is preprocessed using spaCy to perform tokenization, lemmatization, and removal of stop words and punctuation. 2.Vectorization using scikit-learn: Text data is converted into numerical vectors using scikit-learn's TfidfVectorizer. 3.Train-Test Split: The dataset is split into training and testing sets. 4.Classification Algorithms:

  • KNeighbours Classification: Using scikit-learn's KNeighborsClassifier to classify retail product categories.
  • Multinomial Naive Bayes: Employing scikit-learn's MultinomialNB classifier for category classification.

File Structure:

retail-category-classification/ │ ├── data/ │ └── your_dataset.csv │ ├── src/ │ ├── config.py │ ├── data_preprocessing.py │ ├── classification_models.py │ └── main.py │ ├── README.md └── requirements.txt

Results:

The results of the classification models will be displayed in the console or stored in output files, depending on the implementation in main.py. Evaluation metrics such as accuracy, precision, recall, and F1-score will be provided for each model.

Contributing:

Contributions to improve this project are welcome! Please fork the repository, make changes, and submit a pull request.

License:

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published