Unfolding the Actor-Genre Constellation in Cinema: Relationships, Sentiment, and the Rise of the Sympathetic Villain

You can find the data story on our website here

Abstract

This project explores the evolution of actors’ careers and the portrayal of antagonists in cinema, using a blend of network analysis, natural language processing (NLP), and sentiment analysis. We aim to uncover how actor career trajectories evolve across genres, the collaborative networks that shape successful film outcomes, and the rise of the "sympathetic villain" in popular cinema. Through the CMU Movie Summary Corpus and supplementary datasets, we will analyze genre shifts, actor collaboration clusters, and changing emotional tones associated with antagonists. The project intends to provide visualizations and insights into the key elements that drive cinematic success, while also offering an interactive component where users can simulate potential movie plots based on actor profiles. Our work will reveal trends in Hollywood's storytelling dynamics and demonstrate the interconnectedness of genre evolution, actor choices, and character portrayal.

Project Structure

The directory structure of new project looks like this:

├── data/                     <- Project data files #IGNORED
   ├── CMU_dataset/           <- Chosen dataset
   ├── TMDB_dataset/          <- TMDB local dataset to avoid API requests
   ├── TMDB_dataset_csv/      <- TMDB local dataset to avoid API requests
   ├── the_oscar_award.csv    <- Academy Awards: 1927 - 2024 nominees and winners dataset
   └── movie_data.csv         <- Directors, Actors, Genres, and Movies ratings
│
│
├── output_data/                                <- Processed data files
   └── actor_sentiment_popularity_scores.csv    <- tvtropes_pipeline.py output
│
│
├── src/                               <- Source code
   ├── results_P2.ipynb                <- Old results file, containing all Milestone 2 analysis
   ├── helpers_actors_analysis.py      <- Helper functions to Actors, Movies and Oscars analysis
   ├── helpers_villain_analysis.py     <- Helper functions for villain sentiment analysis
   ├── helpers_API.py                  <- TMDB database API GET functions
   │
   └── drafts/                         <- Separate data pipelines and plots
      ├── SP_plot.ipynb                <- Sentiment/Popularity score plot for actors
      ├── tvtropes_pipeline.py         <- Data pipeline that processes tvtropes file
      ├── sympathetic_villain.ipynb    <- Sentiment analysis pipeline on character_metadata
      └── oscars_movies_analysis.ipynb <- Actor/Genres constellations analysis and additional oscars implementations
│
│
│
├── results.ipynb               <- New main file, containing all Milestone 3 analysis
├── model.app                   <- Application containing our machine learning model deployed with the website
│
├── notebook_merger.py          <- notebooks merger script
├── .gitignore                  <- List of files ignored by git
├── requirements.txt            <- List of used libraries
├── install_requirements.ipynb  <- Notebook to install or update python dependencies
└── README.md

⚠️ Important: Refer to install_requirements.ipynb to ensure all required libraries are installed.

Research Questions

1. How have genre preferences and trends evolved over the decades?

2. What is the impact of actor collaborations and director influence on movie success?

3. How has the portrayal of villains, particularly sympathetic antagonists, changed over time?

4. Can sentiment analysis reveal patterns in character portrayals, especially for antagonists?

5. How do combined factors (actors, genres, directors) predict movie success metrics (IMDb ratings, box office revenues)?

Proposed Additional Datasets

IMDb Collaborations Data
- Content: Insights into actor pairings, collaboration frequency, and success metrics (e.g., box office revenue, IMDb ratings).
- Processing Approach: Integration of IMDb collaboration data to analyze actor constellations. Using NetworkX, we’ll map actor networks and calculate centrality metrics for identifying influential nodes and clusters within Hollywood.
The Movie Database (TMDb) API
- Content: Metadata for films, including genres, keywords, and actor bios.
- Processing Approach: Using Python API requests to gather additional genre and character information for sentiment analysis. Pagination and API throttling will be managed during requests.
Oscars Dataset
- Content: Oscar nomination and award data from 1927 to 2024, including details on award categories, nominees, and winners.
- Processing Approach: Analysis of Oscar data to correlate critically acclaimed performances with sentiment trends and genre shifts.
IMDb Ratings
- Content: IMDb ratings, votes, and reviews for movies.
- Processing Approach: Correlation of actor clusters and genres with box office success metrics, exploring how actor networks and genres contribute to success.

Here is the revised and more detailed version of your methods section reflecting the updates and additional content from the newly provided notebook:

Methods

1. NLP and Sentiment Analysis

Text Tokenization: Using NLTK to tokenize plot summaries for sentiment analysis.
Sentiment Categorization: Employing VADER Sentiment Analysis to classify characters as sympathetic villains based on polarity scores.

2. Data Manipulation and Integration

Data Cleaning: Extracting relevant features from character.metadata.tsv and plot_summaries.txt.
Character Analysis: Using pandas to isolate and analyze prominent characters and their portrayals.

3. Network Analysis of Actor Collaborations

Graph Construction: Using NetworkX to map actor collaborations and clusters.
Metrics: Evaluating Actor Collaboration Frequency, Genre Diversity Score, and Network Centrality.

4. Predictive Modeling

Revenue and Ratings Prediction: Using XGBoost and sklearn to predict box office revenues and IMDb ratings based on actor collaborations, genres, and sentiment.

5. Data Visualization

Interactive Plots: Created with matplotlib, mplcursors, and matplotlib-venn.

Project Contributions

Visualizations

Genre Trends and Evolution: Visualizing how genres have evolved over time.
Top Directors: Highlighting directors and their most reviewed movies.
Actor Collaboration Networks: Exploring diversity and connectivity among actors.
Sentiment Trends: Analyzing antagonist portrayals and their emotional impact.

Machine Learning Models

Predictive Models: Developed models to forecast IMDb ratings and box office revenues based on:
- Director influence
- Actor networks
- Genre attributes

Interactive Features

Actor Collaboration Explorer: A tool to explore actor networks and their collaborations.
Movie Success Simulator: Simulates the probability of success for a potential movie.

Deliverables

Visualizations

Genre Trends and Evolution
Actor Collaboration Networks
Sentiment Trends of Antagonists

Machine Learning Models

Predictive models for IMDb ratings and box office revenues.

Interactive Features

Actor Collaboration Explorer
Movie Success Simulator

Final Deliverables

Data Story: Here.
Final Notebook: results.ipynb.
Supporting Scripts: For modular and clean implementation.

⚠️ Notebook Viewer Compatibility Issue: Interactive widgets may not render on static viewers like GitHub.

Contributions

Team Member	Contribution
Karine Rafla	Cult movies research and analysis
Mehdi Bouchoucha	Project layout and website creation
Mohamed Hedi Hidri	Interactive predictor implementation
Sami Amrouche	Genres and actors research and analysis
Tamara Antoun	Cult characters research and analysis

Conclusion

This project, completed as part of the CS-401 Applied Data Analysis course at EPFL (2024), explores the intricate relationships between actors, genres, and sentiment in cinema, providing unique insights into the evolving dynamics of storytelling in Hollywood. Through a combination of network analysis, sentiment analysis, and predictive modeling, we achieved:

Key Insights: Understanding the rise of sympathetic villains, the evolution of genre trends, and the role of actor collaborations in movie success.
Practical Tools: Interactive visualizations and predictive models for exploring cinematic success factors, such as IMDb ratings and box office revenues.

Despite challenges, including handling deprecated Freebase data and integrating unconventional CoreNLP datasets, our team implemented innovative solutions to deliver meaningful results.

Future Directions:

Expanding datasets to include global cinema, allowing for a more diverse and inclusive analysis.
Developing a generative model for movie plot creation, tailoring summaries and titles to actor profiles and sentiment trends.
Enhancing interactivity in visual tools, enabling deeper user engagement and exploration.

This work lays a foundation for further exploration of the connections between storytelling, character sentiment, and cinematic success, contributing to the broader understanding of the evolving film industry.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unfolding the Actor-Genre Constellation in Cinema: Relationships, Sentiment, and the Rise of the Sympathetic Villain

You can find the data story on our website here

Abstract

Project Structure

Research Questions

Proposed Additional Datasets

Proposed Additional Datasets

Methods

1. NLP and Sentiment Analysis

2. Data Manipulation and Integration

3. Network Analysis of Actor Collaborations

4. Predictive Modeling

5. Data Visualization

Project Contributions

Visualizations

Machine Learning Models

Interactive Features

Deliverables

Visualizations

Machine Learning Models

Interactive Features

Final Deliverables

Contributions

Conclusion

Future Directions:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
data		data
models		models
output_data		output_data
plots		plots
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
actors_genres_helpers.py		actors_genres_helpers.py
cult_part_helpers.py		cult_part_helpers.py
genres_helpers.py		genres_helpers.py
install_requirements.ipynb		install_requirements.ipynb
model_app.py		model_app.py
notebook_merger.py		notebook_merger.py
packages.txt		packages.txt
requirements.txt		requirements.txt
results.ipynb		results.ipynb

Folders and files

Latest commit

History

Repository files navigation

Unfolding the Actor-Genre Constellation in Cinema: Relationships, Sentiment, and the Rise of the Sympathetic Villain

You can find the data story on our website here

Abstract

Project Structure

Research Questions

Proposed Additional Datasets

Proposed Additional Datasets

Methods

1. NLP and Sentiment Analysis

2. Data Manipulation and Integration

3. Network Analysis of Actor Collaborations

4. Predictive Modeling

5. Data Visualization

Project Contributions

Visualizations

Machine Learning Models

Interactive Features

Deliverables

Visualizations

Machine Learning Models

Interactive Features

Final Deliverables

Contributions

Conclusion

Future Directions:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages