Skip to content

Commit f6cc095

Browse files
committed
2 parents e3f7548 + 232730e commit f6cc095

14 files changed

+722
-2
lines changed

README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,22 @@ We will use Kaggle's "Top Spotify Songs in 73 Countries (Daily Updated)" dataset
77
- Do countries prefer explicit music or non explicit music?
88

99
**Setup/usage instructions**
10+
<<<<<<< HEAD
1011
Follow these steps:
1112
1. clone the repository,
1213
2. activate the virtual environment (for Windows .venv2\Scripts\activate),
1314
3. install the requirements (pip install -r reqiurements.txt),
1415
4. copy your BigQuery key into the secrets file and save it (remember not to commit the secrets file),
1516
5. run the main dashboard page (streamlit run Spotify_Dashboard.py) with Streamlit.
1617

18+
=======
19+
Please use Python 3.12.2 or older
20+
21+
Follow these steps:
22+
1. clone the repository
23+
2. create and activate the virtual environment (depending on your computer and Windows version you can use a variation of: (1) python -m venv .venv for creation and (2) .venv\Scripts\activate for activation)
24+
3. Install the requirements (pip install -r requirements.txt). Please exercise patience. It takes a really long time
25+
5=4. run the main dashboard page (streamlit run Spotify_Dashboard.py) with Streamlit.
26+
27+
Deployed App link: https://advanced-computing-alexa-giulio-rep-spotify-dashboard-zvzfpp.streamlit.app/
28+
>>>>>>> 232730eac4dea19caf36feb35cd5930b0b946871

Spotify_Dashboard.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,9 +73,11 @@ def rain_emojis(emoji):
7373
"Select a country:",
7474
options=list(locations.keys()),
7575
)
76-
if st.button("Go to country"):
77-
st.switch_page(locations[selection][2])
7876

77+
# Automatically redirect if a selection is made
78+
if selection:
79+
st.switch_page(locations[selection][2])
80+
7981
#display map
8082
st.write("Check out this map to see which countries we feature on our app:")
8183
st_folium(map, width=700, height=500)

old_code/2_Proposal.py

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
import streamlit as st
2+
3+
# Page Title
4+
st.title("Spotify Streaming Analysis")
5+
6+
# Dataset Section
7+
st.header("What dataset are you going to use?")
8+
st.write("We will use Kaggle's \"Top Spotify Songs in 73 Countries (Daily Updated)\" dataset. This dataset is updated daily to include the top songs and artists streamed across the world.")
9+
st.markdown("[Link to dataset](https://www.kaggle.com/datasets/asaniczka/top-spotify-songs-in-73-countries-daily-updated?resource=download)")
10+
11+
# Research Questions
12+
st.header("What are your research question(s)?")
13+
st.write("We are interested in answering the following questions:")
14+
st.markdown("""
15+
- Which genres of music and artists are most popular Italy and the US?
16+
- What is the relationship between song features (speechiness, explicitness, etc) and their popularity?
17+
- Do countries tend to listen to domestic music (i.e. music produced in the user's country or sung in the home language) more than foreign music?
18+
- Which songs/artists are most popular and when do they start to lose popularity?
19+
20+
We integrated Roberto's feedback and eliminated "which country listens to the most music". We also slightly modified the wording when it comes to song features. Lastly, we decided to focus on just the US (Alexa's home) and Italy (Giulio's home)
21+
""")
22+
23+
# Notebook Link
24+
st.header("What's the link to your notebook?")
25+
st.markdown("[Link to our notebook](https://colab.research.google.com/drive/1H0l4hN8gyangmzVwuEojwT8GAi1kykyc?usp=sharing)")
26+
27+
# Target Visualization
28+
st.header("What's your target visualization?")
29+
st.write("Depending on which research question we choose, we have several ideas for how to graph this data (such as a choropleth map of artist popularity across the world or a word cluster chart that shows the name of the most popular artists globally).")
30+
st.write("If we choose to do a line graph of song/artist popularity over time (where each line represents one song or artist), then our target visualization could look like this:")
31+
32+
# Known Unknowns
33+
st.header("What are your known unknowns?")
34+
st.markdown("""
35+
- The dataset specifies that it pulls data from 73 countries -- are these countries biased towards certain continents?
36+
- Are there inconsistencies with how the song/artist titles are written (meaning all upper case/lower case, special characters, etc.) that may make it difficult to properly aggregate the frequency for individual songs?
37+
""")
38+
39+
# Anticipated Challenges
40+
st.header("What challenges do you anticipate?")
41+
st.markdown("""
42+
- If we choose to do a geographic representation of our data, then linking country codes could present a challenge
43+
- Making sure that the data is written uniformly may be time-consuming
44+
- Normalizing the rankings of each song/artist between differently sized countries to better understand a song's popularity
45+
""")
46+
47+
48+
st.markdown("<br><br>", unsafe_allow_html=True)
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
import pandas as pd
2+
import streamlit as st
3+
import matplotlib.pyplot as plt
4+
5+
# title
6+
st.title("🌍 Population by Country in 2023")
7+
8+
# load data
9+
file_path = "population_by_country.csv"
10+
11+
# show only country and year
12+
df = pd.read_csv(file_path, usecols=["Country Name", "2023"])
13+
14+
# make widget to choose number of countries to display
15+
num_countries = st.slider("Select number of countries to display", min_value=5, max_value=len(df), value=20)
16+
17+
# select most populated countries
18+
df = df.sort_values(by="2023", ascending=False).head(num_countries)
19+
20+
# make chart
21+
fig, ax = plt.subplots(figsize=(12, 6))
22+
ax.bar(df["Country Name"], df["2023"], color="skyblue")
23+
ax.set_xlabel("Country Name", fontsize=12)
24+
ax.set_ylabel("Population in 2023", fontsize=12)
25+
ax.set_title("Population by Country in 2023", fontsize=14)
26+
ax.set_xticklabels(df["Country Name"], rotation=90)
27+
28+
# streamlit
29+
st.pyplot(fig)
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
import pandas as pd
2+
import streamlit as st
3+
4+
#loading shortened version of dataset
5+
spotify_data = pd.read_csv("spotify_data_top_us.csv")
6+
7+
# separate artists into individual categories in case they're grouped together (re. collabs)
8+
spotify_data["artists"] = spotify_data["artists"].str.split(", ")
9+
spotify_data = spotify_data.explode("artists")
10+
11+
# group artists by average popularity
12+
artist_popularity = spotify_data.groupby("artists")["popularity"].mean()
13+
14+
# select subset of artists to display for simplicity
15+
artist_popularity = artist_popularity.sort_values(ascending=False)
16+
popular_artists = artist_popularity.head(10)
17+
18+
# create widget to choose how many artists you can see
19+
display_widget = st.slider("Number of Artists to Display", min_value=1, max_value=10, value=10, step=1)
20+
21+
# apply widget to artist_popularity subset
22+
popular_artists = popular_artists.head(display_widget)
23+
24+
# make bar chart
25+
st.title("Popular Spotify Artists")
26+
st.bar_chart(popular_artists)

old_code/5_Project_Part_1.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
import streamlit as st
2+
import pandas as pd
3+
4+
# loading data as a csv instead of api for simplicity
5+
spotify_data = pd.read_csv("spotify_data_top_us.csv")
6+
7+
# separate artists into individual categories in case they're grouped together (re. collabs)
8+
spotify_data["artists"] = spotify_data["artists"].str.split(", ")
9+
spotify_data = spotify_data.explode("artists")
10+
11+
# group artists by average popularity
12+
artist_popularity = spotify_data.groupby("artists")["popularity"].mean()
13+
14+
# create widget to choose how many artists you can see
15+
display_widget = st.slider("Number of Artists to Display", min_value=1, max_value=40, value=20, step=1)
16+
17+
# apply widget to artist_popularity subset
18+
artist_popularity = artist_popularity.head(display_widget)
19+
20+
# make bar chart
21+
st.title("Top 40 Spotify Artists by Alexa and Giulio")
22+
st.bar_chart(artist_popularity)

old_code/dash_w_bigquery.py

Lines changed: 221 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,221 @@
1+
import streamlit as st
2+
import pandas_gbq
3+
from google.oauth2 import service_account
4+
import plotly.express as px
5+
import pandas as pd
6+
#import warnings
7+
from helper_functions_notebook import rain_emojis
8+
9+
# bigquery
10+
credentials = service_account.Credentials.from_service_account_info(
11+
st.secrets["gcp_service_account"]
12+
)
13+
project_id = st.secrets["gcp_service_account"]["project_id"]
14+
dataset = "spotify"
15+
table = "universal_top_spotify_songs"
16+
query = f"""
17+
SELECT DISTINCT artists, country, name, is_explicit, speechiness, danceability, acousticness, liveness
18+
FROM `{project_id}.{dataset}.{table}`
19+
# LIMIT 18000
20+
"""
21+
22+
# loading
23+
spotify_data = pandas_gbq.read_gbq(query, project_id=project_id, credentials=credentials)
24+
25+
# cleaning
26+
spotify_data["artists"] = spotify_data["artists"].str.split(", ")
27+
spotify_data2 = spotify_data.explode("artists")
28+
29+
df_italy = spotify_data2[spotify_data2["country"] == "IT"]
30+
df_us = spotify_data2[spotify_data2["country"] == "US"]
31+
32+
top_artist_italy = df_italy["artists"].value_counts().idxmax()
33+
top_song_italy = df_italy["name"].value_counts().idxmax()
34+
35+
top_artist_us = df_us["artists"].value_counts().idxmax()
36+
top_song_us = df_us["name"].value_counts().idxmax()
37+
38+
# Pie Charts
39+
df_italy["is_explicit"] = df_italy["is_explicit"].astype("object").replace({True: "Yes", False: "No"})
40+
explicit_italy = df_italy.groupby("is_explicit").size().reset_index(name="count")
41+
42+
italy_pie = px.pie(explicit_italy,
43+
names="is_explicit",
44+
values="count",
45+
hole=0.3,
46+
title="Explicit vs Non Explicit Songs in Italy",
47+
labels={"is_explicit": "Explicit?"})
48+
italy_pie.update_traces(marker=dict(colors=["red", "green"]))
49+
50+
df_us["is_explicit"] = df_us["is_explicit"].astype("object").replace({True: " Yes", False: " No"})
51+
explicit_us = df_us.groupby("is_explicit").size().reset_index(name="count")
52+
53+
us_pie = px.pie(explicit_us,
54+
names="is_explicit",
55+
values="count",
56+
hole=0.3,
57+
title="Explicit vs Non Explicit Songs in US",
58+
labels={"is_explicit": "Explicit?"})
59+
us_pie.update_traces(marker=dict(colors=["red", "blue"]))
60+
61+
62+
# speechiness songs data
63+
it_speechiness = spotify_data[spotify_data["country"] == "IT"]["speechiness"].sum()
64+
us_speechiness = spotify_data[spotify_data["country"] == "US"]["speechiness"].sum()
65+
66+
# merging us and it speechiness
67+
df_speechiness_sum = pd.DataFrame({
68+
"Country": ["Italy (IT)", "United States (US)"],
69+
"Total Speechiness": [it_speechiness, us_speechiness]
70+
})
71+
72+
# plot speechiness on a bar chart
73+
speechiness_bar = px.bar(df_speechiness_sum, x="Country", y="Total Speechiness",
74+
title="Who prefers speechy songs?",
75+
labels={"Total Speechiness": "Speechiness Score"},
76+
color="Country")
77+
78+
79+
80+
81+
# danceability songs data
82+
it_danceability = spotify_data[spotify_data["country"] == "IT"]["danceability"].sum()
83+
us_danceability = spotify_data[spotify_data["country"] == "US"]["danceability"].sum()
84+
85+
# merging us and it danceability
86+
df_danceability_sum = pd.DataFrame({
87+
"Country": ["Italy (IT)", "United States (US)"],
88+
"Total Danceability": [it_danceability, us_danceability]
89+
})
90+
91+
# plot danceability on a bar chart
92+
danceability = px.bar(df_danceability_sum, x="Country", y="Total Danceability",
93+
title="Who prefers danceable songs?",
94+
labels={"Total Danceability": "Danceability Score"},
95+
color="Country")
96+
97+
98+
99+
# Compute total acousticness for IT and US
100+
it_acousticness = spotify_data[spotify_data["country"] == "IT"]["acousticness"].sum()
101+
us_acousticness = spotify_data[spotify_data["country"] == "US"]["acousticness"].sum()
102+
103+
# Merge into a DataFrame
104+
df_acousticness_sum = pd.DataFrame({
105+
"Country": ["Italy (IT)", "United States (US)"],
106+
"Total Acousticness": [it_acousticness, us_acousticness]
107+
})
108+
109+
# Create a bar chart
110+
acousticness_chart = px.bar(df_acousticness_sum, x="Country", y="Total Acousticness",
111+
title="Which country has more acoustic songs?",
112+
labels={"Total Acousticness": "Acousticness Score"},
113+
color="Country")
114+
115+
116+
117+
# Compute total liveness for IT and US
118+
it_liveness = spotify_data[spotify_data["country"] == "IT"]["liveness"].sum()
119+
us_liveness = spotify_data[spotify_data["country"] == "US"]["liveness"].sum()
120+
121+
# Merge into a DataFrame
122+
df_liveness_sum = pd.DataFrame({
123+
"Country": ["Italy (IT)", "United States (US)"],
124+
"Total Liveness": [it_liveness, us_liveness]
125+
})
126+
127+
# Create a bar chart
128+
liveness_chart = px.bar(df_liveness_sum, x="Country", y="Total Liveness",
129+
title="Which country has more live-feeling songs?",
130+
labels={"Total Liveness": "Liveness Score"},
131+
color="Country")
132+
133+
#dashboard
134+
135+
LOGO_URL_SMALL = "https://storage.googleapis.com/pr-newsroom-wp/1/2023/05/Spotify_Full_Logo_RGB_Green.png"
136+
st.logo(
137+
LOGO_URL_SMALL,
138+
link="https://storage.googleapis.com/pr-newsroom-wp/1/2023/05/Spotify_Full_Logo_RGB_Green.png",
139+
icon_image=LOGO_URL_SMALL,
140+
)
141+
st.title("Spotify Streaming Analysis")
142+
st.header("by Alexa and Giulio")
143+
st.write("Thanks for stopping by our dashboard! This app uses Kaggle's \"Top Spotify Songs in 73 Countries (Daily Updated)\" dataset to analyze music trends in Italy (Giulio's patria) and the US (Alexa's home). Hope you enjoy!")
144+
st.markdown("[Link to dataset](https://www.kaggle.com/datasets/asaniczka/top-spotify-songs-in-73-countries-daily-updated?resource=download)")
145+
146+
#creating ability to choose which country to look at
147+
selection = st.selectbox("Country:", ("Both", "Italy", "US"))
148+
149+
#individual pages
150+
if selection == "Both":
151+
#welcome
152+
rain_emojis("🇮🇹 🇺🇸")
153+
154+
#artist stats
155+
st.write("#1 Trending Artist in 🇮🇹 Today")
156+
container = st.container(border=True)
157+
container.write(f"{top_artist_italy}")
158+
159+
st.write("#1 Trending Artist in 🇺🇸 Today")
160+
container = st.container(border=True)
161+
container.write(f"{top_artist_us}")
162+
163+
#song stats
164+
st.write("#1 Trending Song in 🇮🇹 Today")
165+
container = st.container(border=True)
166+
container.write(f"{top_song_italy}")
167+
168+
#us
169+
st.write("#1 Trending Song in 🇺🇸 Today")
170+
container = st.container(border=True)
171+
container.write(f"{top_song_us}")
172+
173+
#explicit songs
174+
st.plotly_chart(italy_pie)
175+
st.plotly_chart(us_pie)
176+
177+
#speechiness songs
178+
st.plotly_chart(speechiness_bar)
179+
180+
#danceability songs
181+
st.plotly_chart(danceability)
182+
183+
# acousticness songs
184+
st.plotly_chart(acousticness_chart)
185+
186+
# liveness songs
187+
st.plotly_chart(liveness_chart)
188+
189+
elif selection == "Italy":
190+
#welcome
191+
rain_emojis("🇮🇹")
192+
193+
#artist stats
194+
st.write("#1 Trending Artist 🎤 Today")
195+
container = st.container(border=True)
196+
container.write(f"{top_artist_italy}")
197+
198+
#song stats
199+
st.write("#1 Trending Song 🎵 Today")
200+
container = st.container(border=True)
201+
container.write(f"{top_song_italy}")
202+
203+
#explicit chart
204+
st.plotly_chart(italy_pie)
205+
206+
elif selection == "US":
207+
#welcome
208+
rain_emojis("🇺🇸")
209+
210+
#artist stats
211+
st.write("#1 Trending Artist 🎤 Today")
212+
container = st.container(border=True)
213+
container.write(f"{top_artist_us}")
214+
215+
#song stats
216+
st.write("#1 Trending Song 🎵 Today")
217+
container = st.container(border=True)
218+
container.write(f"{top_song_us}")
219+
220+
#explicit chart
221+
st.plotly_chart(us_pie)

old_code/lab10_big_query.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
Question: What type of data loading will you use? Why? Explain as Markdown in your repository.
2+
3+
Answer: We are using truncate load because we are retrieving the entire dataset from BQ every time we load the data.
4+
Since our project is not comparing music trends over time (at this current version), we are not interested in keeping old versions of our data stored.
5+
Additionally, since we're interested in showing what's popular right now in each country, we only care about pulling the most recent data.
6+
Therefore, we decided to use truncate load because we want to replace our dataset with the newest data each time we load the app.

0 commit comments

Comments
 (0)