Skip to content

Dianayoung77/MovieSentimentLab

Repository files navigation

MovieSentimentLab

MovieSentimentLab is a project focused on sentiment analysis of movie reviews. It builds deep learning models based on labeled movie review datasets to automatically identify emotional tendencies (positive/negative) in text. The project includes preprocessed annotated data, a model training framework, and prediction interfaces, making it suitable for NLP learners and sentiment analysis research.

Project Background

Movie reviews contain rich user emotional information. Automatically analyzing these emotions not only helps audiences quickly understand film reputations but also provides market feedback for the film industry. This project implements automatic sentiment classification of movie review texts through machine learning techniques, aiming to provide an efficient and accurate sentiment analysis tool.

Dataset Introduction

The core dataset includes manually annotated movie review texts, categorized by sentiment tendency and usage:

  • Training Set (train)
    • pos/: Positive sentiment reviews (e.g., praise, recommendations for films)
    • neg/: Negative sentiment reviews (e.g., criticism, complaints about films)
    • unsup/: Unlabeled reviews (usable for unsupervised learning or extended training)
  • Test Set (test)
    • pos/: Positive sentiment reviews for model evaluation
    • neg/: Negative sentiment reviews for model evaluation

Data Examples

  • Positive review:
    "these things have been floating around in my head for damn near 10 years now. Some pieces of this work were really memorable..." (from train/pos/7912_8.txt)

  • Negative review:
    "What a script, what a story, what a mess!" (from test/neg/7035_2.txt)

Core Features

  1. Data Preprocessing: Supports basic NLP preprocessing operations such as text cleaning, tokenization, and vocabulary building
  2. Model Training: Provides a PyTorch-based training framework for sentiment classification models (default: BiRNN)
  3. Sentiment Prediction: Quickly obtain emotional tendencies (positive/negative) of input text through API interfaces
  4. Model Evaluation: Automatically calculates evaluation metrics such as accuracy and recall using the test set

Quick Start

Environment Requirements

  • Python 3.7+
  • Dependencies: torch, flask, nltk, scikit-learn (see requirements.txt for details)

Model Training (Optional)

To retrain the model, run the training script (ensure the dataset is prepared in advance):

python train.py --epochs 10 --batch-size 32 --model-name birnn
  • Optional parameters: --epochs (number of training rounds), --batch-size (batch size), --model-name (model type)

Start Prediction Service

Launch the Flask API service to receive text and return sentiment prediction results:

python api.py

The service runs by default at http://localhost:5000

API Usage Instructions

Sentiment Prediction Interface

Request URL

http://localhost:5000/sentiment

Request Parameters

  • sentence: The movie review text to be analyzed (required)

Response Examples

  • Positive sentiment response:
{
  "data": "positive",
  "status_code": 200
}
  • Negative sentiment response:
{
  "data": "negative",
  "status_code": 200
}
  • Error response (missing parameters):
{
  "error": "No text provided",
  "status_code": 400
}

Call Example (curl)

curl "http://localhost:5000/sentiment?sentence=This movie is amazing! I love it so much."

Notes

  1. Dataset files are encoded in UTF-8 and contain HTML tags (e.g., <br />), which are handled by the preprocessing module
  2. The vocabulary will be built automatically on first run, which may take some time
  3. To improve model performance, try increasing training data volume or adjusting model parameters

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages