Culture Analytics of Race Depiction in Films

Introduction

For an introduction of my reasons to explore this topic, please see the Blog Post. This project aims to understand the depiction of race in films by analyzing the dialog spoken by actors of different races and performing different text analysis techniques to find and understand any differences.

Data

Data was scraped from the IMSDB website. The IMSDB website is a database of movie scripts that are freely available to the public. I scraped all of the scripts (~1200 movies) from the website. For each movie I scraped the Top Cast section from IMDB if there was an IMDB page available. For each actor in the Top Cast section, I scraped their Wikipedia page if one was available (For actors who did not immediately have a wikipedia page I manually selected the link during a semi-auto scraping process). With the wikipedia page of the actors I scraped each actor's entire biography. With the biography of the actors I was able to upload this to the GPT-4-Turbo-Preview API to identify the race of the actor.

Race could be one of eight values: White, Black, LatinX, Middle Eastern, Southeast Asian, East Asian, Native American, and Pacific Islander. These options come from a paper by Malik et al. "Representation of Racial Minorities in Popular Movies".

Text Analysis

The text analysis methods I used are as follows:

Word Count Analysis
- Average Number of Words Spoken by Actors.
- Average Length of a Speech turn by Actors.
- Movie Dialogue Composition by Race.
- Total words spoken by Race.
Sentiment Analysis
Frequency Analysis
Named Entity Recognition

These are discussed in greater detail in the linked blog post

The code for the analytics can be found in the analytics directory. The code is written in Python and uses Django for storing and managing the data in SQL. The code for scraping the data can be found in the scraping directory. Any data processing and cleaning I did is also found in the processing directory. Dependencies are listed in requirements.txt and can be installed with pip install -r requirements.txt. Documentation for using Django can be found here.

I can provide a dump of the data that I scraped if you reach out to me at austinwheeler1112@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
analytics		analytics
core		core
culture_analytics		culture_analytics
data		data
processing		processing
scraping		scraping
.gitignore		.gitignore
README.md		README.md
manage.py		manage.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Culture Analytics of Race Depiction in Films

Introduction

Data

Text Analysis

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Culture Analytics of Race Depiction in Films

Introduction

Data

Text Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages