Marathi Text Classification

Marathi Text Classification is a Natural Language Processing (NLP) project designed to automatically classify text written in the Marathi language into predefined categories.

The project uses techniques such as stopword removal, TF-IDF feature extraction, and machine learning models including Naive Bayes, Logistic Regression, SVM, and Random Forest. It also includes a user interface that is made using Streamlit for real-time predictions using the best-performing model.

This project demonstrates NLP workflows using scikit-learn and highlights how to build and deploy a full text classification pipeline for Marathi language.

Features

Marathi stopword removal and text preprocessing
Feature extraction using TF-IDF Vectorizer
Multiple models tested: Naive Bayes, Logistic Regression, SVM, Random Forest
Final model: Naive Bayes (best performing)
Frontend: Streamlit used for live text classification
Models and vectorizer saved using pickle

How It Works

User inputs Marathi text
Text is cleaned and stopwords are removed
TF-IDF vectorization is applied
Trained model predicts the category
Label encoder returns the final category name

Dataset

Note: The file marathi_sample.csv contains 20 synthetic sample rows.
It is intended for demo/testing only and is not the actual dataset used during model training.

Tech Stack

Python 3.13
Streamlit
Scikit-Learn
Pandas
NLTK (for stopword removal)
TF-IDF Vectorizer
Pickle (for model saving/loading)

Contributors

This project was originally developed as part of a group academic project.

Original Contributors:

Amil Gauri (Maintainer)
Vikas Pandit
Ajay Chaurasiya

Project restructured, documented, and maintained by Amil Gauri for public release.

License

This project is open-source under the MIT License.
You are free to use, modify, and distribute it with proper credit.

See the LICENSE file for full details.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
app		app
data		data
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Marathi Text Classification

Features

How It Works

Dataset

Tech Stack

Contributors

License

About

Uh oh!

Releases

Packages

Languages

License

itzAmil/marathi-text-classification

Folders and files

Latest commit

History

Repository files navigation

Marathi Text Classification

Features

How It Works

Dataset

Tech Stack

Contributors

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages