This Python application performs various analyses on IMDb movie data and includes a machine learning model for rating prediction.
- Average rating analysis by movie genre
- Movie rating analysis by year
- Correlation analysis between variables
- Simple linear regression analysis
- Revenue analysis and patterns
- Runtime vs rating relationship analysis
- Machine learning model for rating prediction
- Random Forest Regressor
- Feature importance analysis
- Model performance evaluation
- Install required Python packages:
pip install -r requirements.txt- Download the IMDb dataset from Kaggle:
- Download from IMDB Movie Dataset
- Copy the downloaded
IMDB-Movie-Data.csvfile to the project directory
To run the analysis:
python movie_analysis.pyThe program will generate the following visualizations:
genre_ratings.png: Average ratings by movie genreyear_ratings.png: Average ratings by yearcorrelation_matrix.png: Correlation matrix between variablesregression_analysis.png: Relationship between Metascore and IMDb Ratingrating_revenue.png: Relationship between movie ratings and revenueruntime_rating.png: Relationship between movie runtime and ratingfeature_importance.png: Most important features for rating predictionprediction_performance.png: Actual vs predicted ratings comparison
When the program runs, it will create visualizations and print summary information to the console. These visualizations will show:
- Which movie genres receive higher ratings
- How movie ratings have changed over the years
- Relationships between different variables
- Correlation between Metascore and IMDb Rating
- How movie ratings affect revenue
- How movie runtime affects ratings
- Which features are most important for predicting movie ratings
- How well the machine learning model performs
The application includes a Random Forest Regressor model that:
- Predicts movie ratings based on various features
- Uses genre, director, runtime, votes, revenue, and other features
- Provides feature importance analysis
- Shows model performance metrics (MSE and R-squared)
- Visualizes actual vs predicted ratings