This project explores and analyzes Netflix's titles dataset to uncover content trends through data cleaning, visual insights, and machine learning.
In the final stage, a model predicts whether a title is a Movie or TV Show based on specific features.
- Introduction
- Data Cleaning
- Exploratory Data Analysis (EDA)
- Insights and Visualizations
- Machine Learning - Predicting Movie vs TV Show
- Dataset
- Tools & Libraries
- How to Run
- Author
Netflix offers a massive collection of titles ranging from TV Shows to Movies.
The goal of this project is to:
- Understand the structure and trends in Netflix's content library
- Build a predictive model that classifies a title as a Movie or TV Show based on its duration, rating, and release year.
- Removed null and duplicate values
- Transformed
durationcolumn to numeric - Encoded categorical columns (
type,rating) for modeling
- Analyzed distribution of Movies vs TV Shows
- Examined content release trends over time
- Investigated duration patterns and rating types
- Visual graphs to uncover patterns in ratings, content duration, and content type
- Insights into trends in different regions and over different years
- Found correlations between release year, content duration, and type
A logistic regression model was trained on the following features:
duration_numrating_encodedrelease_year
- Applied train/test split (80% training / 20% testing)
- Trained a Logistic Regression classifier
- Evaluated using Accuracy, Classification Report, and Confusion Matrix
Model Accuracy: 99.8%
- 📦 Source: Netflix Titles Dataset on Kaggle
- Columns used:
type,duration,rating,release_year
- Python 3
- Pandas, NumPy
- Matplotlib, Seaborn
- Scikit-learn
- Jupyter Notebook
-
Clone the repository: git clone https://github.com/mumairrr/Netflix_Content_Type_Classifier_Umeir.git cd Netflix_Content_Type_Classifier_Umeir
-
Install required libraries: pip install -r requirements.txt
-
Launch Jupyter Notebook: jupyter notebook
Then open: 📄 Netflix_Content_Type_Classifier_Umeir.ipynb
📊 Dataset Netflix dataset from Kaggle - Netflix Shows
📄 Requirements All dependencies are listed in requirements.txt:
--
Umeir Mohamed
Master’s Student in Data Science – Milano Bicocca University
LinkedIn | GitHub