Unsupervised Machine Learning with Python

This repository showcases my process of applying KMeans clustering to a Spotify dataset of over 5,000 songs. The goal was to create musically meaningful clusters of songs (i.e., playlists) and ultimately generate actual playlists on Spotify. The notebook in the “Script” folder walks through every step of this project.

The main question I aimed to answer was: Can machine learning replace human-curated playlists?

Situation

For this project, I worked with Moosic, a small-scale startup that relies on music experts to curate playlists for their customers. Moosic is exploring machine learning as a scalable alternative to human-created playlists. My task was to determine whether machine learning could effectively replace human expertise in playlist curation, with the added constraint that each playlist should contain between 50 and 250 songs.

In the prototype, I applied KMeans clustering, using Quantile Transformer as the data scaler. I also compared the results from applying Principal Component Analysis (PCA) to limit the features.

Conclusion

Based on the prototype, I concluded that machine learning using KMeans clustering is a promising alternative for Moosic's scaling goals. However, further refinements and testing are necessary for optimization.

Files

Data

spotify_5000_songs: The dataset (in CSV format) containing over 5,000 songs from Spotify.

Script

KMeans_clustering_music_playlists_prototype: The notebook that walks you through the entire project, including data preprocessing, clustering, and playlist creation.

Documentation

Playlists_prototype_presentation: A slide deck used for a 5-minute presentation, summarizing the prototype and evaluating whether machine learning is a viable alternative for creating playlists.

Using the Files

Save the spotify_5000_songs dataset to your Google Drive or locally.
Update the file links within the notebook to point to the location of your saved dataset.
- If you're using Google Drive, ensure the links point to the correct file location. This also applies to the code blocks responsible for saving music clusters as CSV files and uploading them to Google Drive. Incorrect links may cause errors.
Enjoy exploring the music playlist samples created and shared in the notebook!

Languages and Libraries Used

Python (3.12.4)
Pandas (2.2.2)
Numpy (1.26.4)
Matplotlib (3.8.4)
Seaborn (0.13.2)
Plotly (5.22.0)
SKLearn (1.4.2)

Tools Used

Google Colab: Used for running the notebook.
Google Drive: Used to store data and save output files.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Data		Data
Documentation		Documentation
Script		Script
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised Machine Learning with Python

Situation

Conclusion

Files

Data

Script

Documentation

Using the Files

Languages and Libraries Used

Tools Used

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Machine Learning with Python

Situation

Conclusion

Files

Data

Script

Documentation

Using the Files

Languages and Libraries Used

Tools Used

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages