MovieSentimentLab is a project focused on sentiment analysis of movie reviews. It builds deep learning models based on labeled movie review datasets to automatically identify emotional tendencies (positive/negative) in text. The project includes preprocessed annotated data, a model training framework, and prediction interfaces, making it suitable for NLP learners and sentiment analysis research.
Movie reviews contain rich user emotional information. Automatically analyzing these emotions not only helps audiences quickly understand film reputations but also provides market feedback for the film industry. This project implements automatic sentiment classification of movie review texts through machine learning techniques, aiming to provide an efficient and accurate sentiment analysis tool.
The core dataset includes manually annotated movie review texts, categorized by sentiment tendency and usage:
- Training Set (train)
pos/: Positive sentiment reviews (e.g., praise, recommendations for films)neg/: Negative sentiment reviews (e.g., criticism, complaints about films)unsup/: Unlabeled reviews (usable for unsupervised learning or extended training)
- Test Set (test)
pos/: Positive sentiment reviews for model evaluationneg/: Negative sentiment reviews for model evaluation
-
Positive review:
"these things have been floating around in my head for damn near 10 years now. Some pieces of this work were really memorable..." (fromtrain/pos/7912_8.txt) -
Negative review:
"What a script, what a story, what a mess!" (fromtest/neg/7035_2.txt)
- Data Preprocessing: Supports basic NLP preprocessing operations such as text cleaning, tokenization, and vocabulary building
- Model Training: Provides a PyTorch-based training framework for sentiment classification models (default: BiRNN)
- Sentiment Prediction: Quickly obtain emotional tendencies (positive/negative) of input text through API interfaces
- Model Evaluation: Automatically calculates evaluation metrics such as accuracy and recall using the test set
- Python 3.7+
- Dependencies:
torch,flask,nltk,scikit-learn(seerequirements.txtfor details)
To retrain the model, run the training script (ensure the dataset is prepared in advance):
python train.py --epochs 10 --batch-size 32 --model-name birnn- Optional parameters:
--epochs(number of training rounds),--batch-size(batch size),--model-name(model type)
Launch the Flask API service to receive text and return sentiment prediction results:
python api.pyThe service runs by default at http://localhost:5000
http://localhost:5000/sentiment
sentence: The movie review text to be analyzed (required)
- Positive sentiment response:
{
"data": "positive",
"status_code": 200
}- Negative sentiment response:
{
"data": "negative",
"status_code": 200
}- Error response (missing parameters):
{
"error": "No text provided",
"status_code": 400
}curl "http://localhost:5000/sentiment?sentence=This movie is amazing! I love it so much."- Dataset files are encoded in UTF-8 and contain HTML tags (e.g.,
<br />), which are handled by the preprocessing module - The vocabulary will be built automatically on first run, which may take some time
- To improve model performance, try increasing training data volume or adjusting model parameters