Skip to content

9eek9/Burmese-News-Classification

Repository files navigation

📰 Burmese News Classification with BiLSTM & Streamlit

This project builds an AI-powered Burmese news classifier using both traditional machine learning (Naïve Bayes) and deep learning (BiLSTM).
It includes a ready-to-run Streamlit web app for real-time predictions in the Burmese language.


🧩 Project Overview

  • Classifies Burmese news articles into categories such as Politics, Business, Sports, and Entertainment.
  • Implements custom Burmese NLP preprocessing — tokenization, stopword removal, and sequence padding.
  • Compares performance between Naïve Bayes (TF-IDF) and BiLSTM (Keras).
  • Provides an interactive Streamlit UI for user testing and visualization.

📊 Model Highlights

Model Type Accuracy Description
Naïve Bayes ML baseline ~80% TF-IDF with sklearn
BiLSTM Deep Learning ~90% Sequence model using TensorFlow/Keras

📁 Project Structure

├── app.py                        # Streamlit web app
├── Burmese_News_Classification.ipynb  # Training notebook
├── requirements.txt              # Dependencies
├── stopwords.txt                 # Burmese stopword list
├── models/                       # Trained models
│   ├── bilstm_mynews.keras
│   ├── nb_tfidf.joblib
│   ├── keras_tokenizer.pkl
│   ├── label_encoder.pkl
│   └── config.json
└── README_streamlit.md           # Detailed Streamlit setup

🚀 Running the Streamlit App

pip install -r requirements.txt
streamlit run app.py

Then open the provided local URL (e.g. http://localhost:8501) in your browser.


🧠 Example Predictions

Input Text Predicted Category
"အဆိုတော်အသစ်တစ်ဦး ပြိုင်ပွဲတွင် ပထမဆုရရှိခဲ့သည်။" Entertainment
"အစိုးရသည် စီးပွားရေးဖွံ့ဖြိုးရေးအတွက် ငွေထုတ်ပေးမည်။" Business

🧰 Technologies Used

  • Python 3.10
  • TensorFlow / Keras
  • Scikit-learn
  • Pyidaungsu Tokenizer
  • Streamlit for deployment

🔮 Future Enhancements

  • Integrate Multilingual BERT (mBERT) for improved contextual understanding.
  • Add Gradio/Flask API endpoint for backend usage.
  • Extend to multi-label classification (e.g., News + Sentiment).

About

Burmese news text classification using Naïve Bayes and BiLSTM, custom NLP pipeline with Pyidaungsu tokenizer and stopword filtering.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors