This repository contains a Jupyter notebook that compares the performance of different language models for sentiment analysis on the IMDb movie reviews dataset.
The project evaluates and compares the performance of the following models:
- DistilBERT (base model)
- Fine-tuned DistilBERT
- GPT-2 (base model)
- Logistic Regression with TF-IDF features (traditional ML approach)
The models are compared based on their accuracy, precision, recall, F1-score, and confusion matrices.
The project uses the IMDb movie reviews dataset, which consists of 50,000 movie reviews labeled as either positive or negative sentiment.
- Text cleaning: Removing HTML tags and special characters
- Lowercasing text
- Splitting into train/test sets (80/20 split)
- Tokenization for transformer models
- DistilBERT Base Model: Using the pre-trained model without fine-tuning
- Fine-tuned DistilBERT: Fine-tuning the pre-trained model on the IMDb dataset
- GPT-2 Base Model: Using the pre-trained model for sentiment classification
- Logistic Regression: Traditional ML approach using TF-IDF vectorization
- Accuracy
- Precision, Recall, F1-score
- Confusion matrices
- Training time comparison
The analysis shows that:
- Fine-tuned DistilBERT performs best with approximately 90% accuracy
- Logistic Regression with TF-IDF features performs surprisingly well (about 89% accuracy)
- Base models (DistilBERT and GPT-2) without fine-tuning perform poorly (around 50% accuracy)
- Traditional machine learning approaches can still be competitive for sentiment analysis tasks
The notebook requires the following libraries:
- PyTorch
- Transformers (Hugging Face)
- Scikit-learn
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Open the
LLM_Comparison_Project.ipynbin Google Colab or Jupyter Notebook - Execute the cells sequentially to reproduce the analysis
The notebook includes several visualizations:
- Confusion matrices for all models
- Performance comparison charts
- Accuracy and F1-score comparisons