Jokes Clustering Project

This project aims to cluster jokes into different categories using unsupervised learning techniques.

Project Overview

In this project, I performed the following tasks:

Data Cleaning and Preprocessing
Feature Extraction using TF-IDF Vectorizer
Clustering using KMeans Algorithm
Visualization using Network Graph and Parallel Coordinates

Dataset

The dataset used in this project can be found in the dataset.csv. It contains a list of jokes in plain text format, thanks to Arya Shah for building this dataset.

Requirements

The project was implemented using Python 3.9.

How to Use

To run the project, simply run the your-dad-joked-once.ipynb file in the root directory. This will preprocess the data, perform clustering and topic modeling, and generate visualizations.

or simply click on this link and give the notebook a run: https://www.kaggle.com/code/nihirshah/your-dad-joked-once upvote if you believe in god and upvote if you don't.

Results

The KMeans algorithm was able to cluster the jokes into 5 different clusters of different atributes and features.

Explore other clustering algorithms and compare their performance
Topic Modeling using Latent Dirichlet Allocation (LDA)
Expand the dataset to include more jokes and categories

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
dad-a-base.csv		dad-a-base.csv
your-dad-joked-once.ipynb		your-dad-joked-once.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jokes Clustering Project

Project Overview

Dataset

Requirements

How to Use

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Jokes Clustering Project

Project Overview

Dataset

Requirements

How to Use

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages