In this project, I have created a dataset of the best-selling artists of English music according to Wikipedia. Then, I have used web scraping to collect all their song lyrics from various lyric websites.
I have used various text preprocessing techniques to clean the dataset and remove the noise. These techniques include stemming and lemmatization.
I have compared the vocabulary of the best-selling artists of all time by analyzing their lyrics and comparing them with various plots.