This project explores sentiment analysis of Saraiki poetry, a low-resource regional language, using classical NLP preprocessing and a lightweight deep learning model.
The focus is on:
- Manual data collection and labeling
- Emotion normalization and sentiment abstraction
- End-to-end NLP pipeline using Python and TensorFlow
- Preparing the model for potential mobile (Android) deployment via TensorFlow Lite
- Language: Saraiki
- Domain: Poetry
- Total samples: 958 (full dataset kept private)
- Public release: Partial sample only
Each poem was manually labeled with an emotion:
- happy, sad, surprise, fear, anger, disgust
Emotions were mapped to binary sentiment:
- Positive (p): happy, surprise
- Negative (n): sad, fear, anger, disgust
Only a subset of the dataset is uploaded to illustrate:
- Data format
- Labeling approach
- Reproducibility
- The complete dataset is retained for possible future academic use.
Before training, the dataset was cleaned and analyzed to ensure consistency:
- Normalized inconsistent emotion labels (case differences, spelling variants)
- Verified emotion and sentiment distributions
- Analyzed emotion-to-sentiment relationships
The following visualizations summarize the dataset structure:
- Emotion distribution:
- Sentiment (positive / negative) distribution:
- Emotion → sentiment mapping:
- Data loaded from Excel files using Pandas
- Text tokenized using Keras Tokenizer
- Sequences padded/truncated to fixed length
- Out-of-vocabulary handling enabled
A lightweight neural network was used due to dataset size:
- Embedding layer
- Global Average Pooling
- Dense (ReLU)
- Output layer (Sigmoid)
The model performs binary sentiment classification, not multi-class emotion prediction.
- Train / test split: 80% / 20%
- Loss function: Binary Crossentropy
- Optimizer: Adam
- Epochs: 30
The model learns general sentiment patterns from poetic text rather than fine-grained emotional nuance.
After training, the model is converted to TensorFlow Lite (TFLite). This enables:
- On-device inference
- Low latency predictions
- Future integration with Android applications
- Small dataset size
- Possible class imbalance
- Binary sentiment only
- Simple model architecture
Results should be interpreted as exploratory and baseline-level, not production-ready.
- Python
- Pandas
- Matplotlib
- NLTK
- TensorFlow / Keras
- TensorFlow Lite
This project emphasizes data preparation, language challenges, and applied ML workflow in a low-resource setting rather than optimized model performance.


