advanced-computing
diff --git a/‎README.md‎
Lines changed: 12 additions & 0 deletions b/‎README.md‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎Spotify_Dashboard.py‎
Lines changed: 4 additions & 2 deletions b/‎Spotify_Dashboard.py‎
Lines changed: 4 additions & 2 deletions
diff --git a/‎old_code/2_Proposal.py‎
Lines changed: 48 additions & 0 deletions b/‎old_code/2_Proposal.py‎
Lines changed: 48 additions & 0 deletions
diff --git a/‎old_code/3_Project_Part_2_Countries.py‎
Lines changed: 29 additions & 0 deletions b/‎old_code/3_Project_Part_2_Countries.py‎
Lines changed: 29 additions & 0 deletions
diff --git a/‎old_code/4_Project_Part_2_Musicians.py‎
Lines changed: 26 additions & 0 deletions b/‎old_code/4_Project_Part_2_Musicians.py‎
Lines changed: 26 additions & 0 deletions
diff --git a/‎old_code/5_Project_Part_1.py‎
Lines changed: 22 additions & 0 deletions b/‎old_code/5_Project_Part_1.py‎
Lines changed: 22 additions & 0 deletions
diff --git a/‎old_code/dash_w_bigquery.py‎
Lines changed: 221 additions & 0 deletions b/‎old_code/dash_w_bigquery.py‎
Lines changed: 221 additions & 0 deletions
diff --git a/‎old_code/lab10_big_query.md‎
Lines changed: 6 additions & 0 deletions b/‎old_code/lab10_big_query.md‎
Lines changed: 6 additions & 0 deletions
@@ -7,10 +7,22 @@ We will use Kaggle's "Top Spotify Songs in 73 Countries (Daily Updated)" dataset
 - Do countries prefer explicit music or non explicit music?
 
 **Setup/usage instructions**
+<<<<<<< HEAD
 Follow these steps: 
 1. clone the repository, 
 2. activate the virtual environment (for Windows .venv2\Scripts\activate), 
 3. install the requirements (pip install -r reqiurements.txt), 
 4. copy your BigQuery key into the secrets file and save it (remember not to commit the secrets file), 
 5. run the main dashboard page (streamlit run Spotify_Dashboard.py) with Streamlit.
 
+=======
+Please use Python 3.12.2 or older
+
+Follow these steps: 
+1. clone the repository 
+2. create and activate the virtual environment (depending on your computer and Windows version you can use a variation of: (1) python -m venv .venv for creation and (2) .venv\Scripts\activate for activation)
+3. Install the requirements (pip install -r requirements.txt). Please exercise patience. It takes a really long time
+5=4. run the main dashboard page (streamlit run Spotify_Dashboard.py) with Streamlit.
+
+Deployed App link: https://advanced-computing-alexa-giulio-rep-spotify-dashboard-zvzfpp.streamlit.app/
+>>>>>>> 232730eac4dea19caf36feb35cd5930b0b946871
@@ -73,9 +73,11 @@ def rain_emojis(emoji):
     "Select a country:",
     options=list(locations.keys()),
 )
-if st.button("Go to country"):
-    st.switch_page(locations[selection][2])
 
+# Automatically redirect if a selection is made
+if selection:
+    st.switch_page(locations[selection][2])
+    
 #display map
 st.write("Check out this map to see which countries we feature on our app:")
 st_folium(map, width=700, height=500)
 
@@ -0,0 +1,48 @@
+import streamlit as st
+
+# Page Title
+st.title("Spotify Streaming Analysis")
+
+# Dataset Section
+st.header("What dataset are you going to use?")
+st.write("We will use Kaggle's \"Top Spotify Songs in 73 Countries (Daily Updated)\" dataset. This dataset is updated daily to include the top songs and artists streamed across the world.")
+st.markdown("[Link to dataset](https://www.kaggle.com/datasets/asaniczka/top-spotify-songs-in-73-countries-daily-updated?resource=download)")
+
+# Research Questions
+st.header("What are your research question(s)?")
+st.write("We are interested in answering the following questions:")
+st.markdown("""
+- Which genres of music and artists are most popular Italy and the US? 
+- What is the relationship between song features (speechiness, explicitness, etc) and their popularity?
+- Do countries tend to listen to domestic music (i.e. music produced in the user's country or sung in the home language) more than foreign music?
+- Which songs/artists are most popular and when do they start to lose popularity?
+            
+We integrated Roberto's feedback and eliminated "which country listens to the most music". We also slightly modified the wording when it comes to song features. Lastly, we decided to focus on just the US (Alexa's home) and Italy (Giulio's home)
+""")
+
+# Notebook Link
+st.header("What's the link to your notebook?")
+st.markdown("[Link to our notebook](https://colab.research.google.com/drive/1H0l4hN8gyangmzVwuEojwT8GAi1kykyc?usp=sharing)")
+
+# Target Visualization
+st.header("What's your target visualization?")
+st.write("Depending on which research question we choose, we have several ideas for how to graph this data (such as a choropleth map of artist popularity across the world or a word cluster chart that shows the name of the most popular artists globally).")
+st.write("If we choose to do a line graph of song/artist popularity over time (where each line represents one song or artist), then our target visualization could look like this:")
+
+# Known Unknowns
+st.header("What are your known unknowns?")
+st.markdown("""
+- The dataset specifies that it pulls data from 73 countries -- are these countries biased towards certain continents?
+- Are there inconsistencies with how the song/artist titles are written (meaning all upper case/lower case, special characters, etc.) that may make it difficult to properly aggregate the frequency for individual songs?
+""")
+
+# Anticipated Challenges
+st.header("What challenges do you anticipate?")
+st.markdown("""
+- If we choose to do a geographic representation of our data, then linking country codes could present a challenge
+- Making sure that the data is written uniformly may be time-consuming
+- Normalizing the rankings of each song/artist between differently sized countries to better understand a song's popularity
+""")
+
+
+st.markdown("<br><br>", unsafe_allow_html=True)
@@ -0,0 +1,29 @@
+import pandas as pd
+import streamlit as st
+import matplotlib.pyplot as plt
+
+# title
+st.title("🌍 Population by Country in 2023")
+
+# load data
+file_path = "population_by_country.csv" 
+
+# show only country and year
+df = pd.read_csv(file_path, usecols=["Country Name", "2023"])
+
+# make widget to choose number of countries to display
+num_countries = st.slider("Select number of countries to display", min_value=5, max_value=len(df), value=20)
+
+# select most populated countries 
+df = df.sort_values(by="2023", ascending=False).head(num_countries)
+
+# make chart
+fig, ax = plt.subplots(figsize=(12, 6))
+ax.bar(df["Country Name"], df["2023"], color="skyblue")
+ax.set_xlabel("Country Name", fontsize=12)
+ax.set_ylabel("Population in 2023", fontsize=12)
+ax.set_title("Population by Country in 2023", fontsize=14)
+ax.set_xticklabels(df["Country Name"], rotation=90)
+
+# streamlit
+st.pyplot(fig)
@@ -0,0 +1,26 @@
+import pandas as pd
+import streamlit as st
+
+#loading shortened version of dataset
+spotify_data = pd.read_csv("spotify_data_top_us.csv")
+
+# separate artists into individual categories in case they're grouped together (re. collabs)
+spotify_data["artists"] = spotify_data["artists"].str.split(", ")
+spotify_data = spotify_data.explode("artists")
+
+# group artists by average popularity
+artist_popularity = spotify_data.groupby("artists")["popularity"].mean()
+
+# select subset of artists to display for simplicity
+artist_popularity = artist_popularity.sort_values(ascending=False)
+popular_artists = artist_popularity.head(10)
+
+# create widget to choose how many artists you can see
+display_widget = st.slider("Number of Artists to Display", min_value=1, max_value=10, value=10, step=1)
+
+# apply widget to artist_popularity subset
+popular_artists = popular_artists.head(display_widget)
+
+# make bar chart
+st.title("Popular Spotify Artists")
+st.bar_chart(popular_artists)
@@ -0,0 +1,22 @@
+import streamlit as st
+import pandas as pd
+
+# loading data as a csv instead of api for simplicity
+spotify_data = pd.read_csv("spotify_data_top_us.csv")
+
+# separate artists into individual categories in case they're grouped together (re. collabs)
+spotify_data["artists"] = spotify_data["artists"].str.split(", ")
+spotify_data = spotify_data.explode("artists")
+
+# group artists by average popularity
+artist_popularity = spotify_data.groupby("artists")["popularity"].mean()
+
+# create widget to choose how many artists you can see
+display_widget = st.slider("Number of Artists to Display", min_value=1, max_value=40, value=20, step=1)
+
+# apply widget to artist_popularity subset
+artist_popularity = artist_popularity.head(display_widget)
+
+# make bar chart
+st.title("Top 40 Spotify Artists by Alexa and Giulio")
+st.bar_chart(artist_popularity)
@@ -0,0 +1,221 @@
+import streamlit as st
+import pandas_gbq
+from google.oauth2 import service_account
+import plotly.express as px
+import pandas as pd
+#import warnings
+from helper_functions_notebook import rain_emojis  
+
+# bigquery
+credentials = service_account.Credentials.from_service_account_info(
+    st.secrets["gcp_service_account"]
+)
+project_id = st.secrets["gcp_service_account"]["project_id"]
+dataset = "spotify"
+table = "universal_top_spotify_songs"
+query = f"""
+    SELECT DISTINCT artists, country, name, is_explicit, speechiness, danceability, acousticness, liveness
+    FROM `{project_id}.{dataset}.{table}` 
+ #  LIMIT 18000
+"""  
+
+# loading
+spotify_data = pandas_gbq.read_gbq(query, project_id=project_id, credentials=credentials)
+
+# cleaning
+spotify_data["artists"] = spotify_data["artists"].str.split(", ")
+spotify_data2 = spotify_data.explode("artists")
+
+df_italy = spotify_data2[spotify_data2["country"] == "IT"]
+df_us = spotify_data2[spotify_data2["country"] == "US"]
+
+top_artist_italy = df_italy["artists"].value_counts().idxmax()
+top_song_italy = df_italy["name"].value_counts().idxmax()
+
+top_artist_us = df_us["artists"].value_counts().idxmax()
+top_song_us = df_us["name"].value_counts().idxmax()
+
+# Pie Charts
+df_italy["is_explicit"] = df_italy["is_explicit"].astype("object").replace({True: "Yes", False: "No"})  
+explicit_italy = df_italy.groupby("is_explicit").size().reset_index(name="count") 
+
+italy_pie = px.pie(explicit_italy, 
+                names="is_explicit", 
+                values="count", 
+                hole=0.3, 
+                title="Explicit vs Non Explicit Songs in Italy",
+                labels={"is_explicit": "Explicit?"})
+italy_pie.update_traces(marker=dict(colors=["red", "green"]))
+
+df_us["is_explicit"] = df_us["is_explicit"].astype("object").replace({True: " Yes", False: " No"})  
+explicit_us = df_us.groupby("is_explicit").size().reset_index(name="count")  
+
+us_pie = px.pie(explicit_us, 
+                names="is_explicit", 
+                values="count", 
+                hole=0.3, 
+                title="Explicit vs Non Explicit Songs in US",
+                labels={"is_explicit": "Explicit?"})
+us_pie.update_traces(marker=dict(colors=["red", "blue"]))
+
+
+# speechiness songs data
+it_speechiness = spotify_data[spotify_data["country"] == "IT"]["speechiness"].sum()
+us_speechiness = spotify_data[spotify_data["country"] == "US"]["speechiness"].sum()
+
+# merging us and it speechiness
+df_speechiness_sum = pd.DataFrame({
+    "Country": ["Italy (IT)", "United States (US)"],
+    "Total Speechiness": [it_speechiness, us_speechiness]
+})
+
+# plot speechiness on a bar chart
+speechiness_bar = px.bar(df_speechiness_sum, x="Country", y="Total Speechiness",
+                         title="Who prefers speechy songs?",
+                         labels={"Total Speechiness": "Speechiness Score"},
+                         color="Country")
+
+
+
+
+# danceability songs data
+it_danceability = spotify_data[spotify_data["country"] == "IT"]["danceability"].sum()
+us_danceability = spotify_data[spotify_data["country"] == "US"]["danceability"].sum()
+
+# merging us and it danceability
+df_danceability_sum = pd.DataFrame({
+    "Country": ["Italy (IT)", "United States (US)"],
+    "Total Danceability": [it_danceability, us_danceability]
+})
+
+# plot danceability on a bar chart
+danceability = px.bar(df_danceability_sum, x="Country", y="Total Danceability",
+                         title="Who prefers danceable songs?",
+                         labels={"Total Danceability": "Danceability Score"},
+                         color="Country")
+
+
+
+# Compute total acousticness for IT and US
+it_acousticness = spotify_data[spotify_data["country"] == "IT"]["acousticness"].sum()
+us_acousticness = spotify_data[spotify_data["country"] == "US"]["acousticness"].sum()
+
+# Merge into a DataFrame
+df_acousticness_sum = pd.DataFrame({
+    "Country": ["Italy (IT)", "United States (US)"],
+    "Total Acousticness": [it_acousticness, us_acousticness]
+})
+
+# Create a bar chart
+acousticness_chart = px.bar(df_acousticness_sum, x="Country", y="Total Acousticness",
+                            title="Which country has more acoustic songs?",
+                            labels={"Total Acousticness": "Acousticness Score"},
+                            color="Country")
+
+
+
+# Compute total liveness for IT and US
+it_liveness = spotify_data[spotify_data["country"] == "IT"]["liveness"].sum()
+us_liveness = spotify_data[spotify_data["country"] == "US"]["liveness"].sum()
+
+# Merge into a DataFrame
+df_liveness_sum = pd.DataFrame({
+    "Country": ["Italy (IT)", "United States (US)"],
+    "Total Liveness": [it_liveness, us_liveness]
+})
+
+# Create a bar chart
+liveness_chart = px.bar(df_liveness_sum, x="Country", y="Total Liveness",
+                        title="Which country has more live-feeling songs?",
+                        labels={"Total Liveness": "Liveness Score"},
+                        color="Country")
+
+#dashboard
+
+LOGO_URL_SMALL = "https://storage.googleapis.com/pr-newsroom-wp/1/2023/05/Spotify_Full_Logo_RGB_Green.png"
+st.logo(
+    LOGO_URL_SMALL,
+    link="https://storage.googleapis.com/pr-newsroom-wp/1/2023/05/Spotify_Full_Logo_RGB_Green.png",
+    icon_image=LOGO_URL_SMALL,
+)
+st.title("Spotify Streaming Analysis")
+st.header("by Alexa and Giulio")
+st.write("Thanks for stopping by our dashboard! This app uses Kaggle's \"Top Spotify Songs in 73 Countries (Daily Updated)\" dataset to analyze music trends in Italy (Giulio's patria) and the US (Alexa's home). Hope you enjoy!")
+st.markdown("[Link to dataset](https://www.kaggle.com/datasets/asaniczka/top-spotify-songs-in-73-countries-daily-updated?resource=download)")
+
+#creating ability to choose which country to look at
+selection = st.selectbox("Country:", ("Both", "Italy", "US"))
+
+#individual pages
+if selection == "Both":
+    #welcome
+    rain_emojis("🇮🇹 🇺🇸")
+
+    #artist stats
+    st.write("#1 Trending Artist in 🇮🇹 Today")
+    container = st.container(border=True)
+    container.write(f"{top_artist_italy}")
+
+    st.write("#1 Trending Artist in 🇺🇸 Today")
+    container = st.container(border=True)
+    container.write(f"{top_artist_us}")
+
+    #song stats    
+    st.write("#1 Trending Song in 🇮🇹 Today")
+    container = st.container(border=True)
+    container.write(f"{top_song_italy}")
+    
+    #us
+    st.write("#1 Trending Song in 🇺🇸 Today")
+    container = st.container(border=True)
+    container.write(f"{top_song_us}")
+
+    #explicit songs
+    st.plotly_chart(italy_pie)
+    st.plotly_chart(us_pie)
+
+    #speechiness songs
+    st.plotly_chart(speechiness_bar)
+
+    #danceability songs
+    st.plotly_chart(danceability)
+
+    # acousticness songs
+    st.plotly_chart(acousticness_chart)
+
+    # liveness songs
+    st.plotly_chart(liveness_chart)
+
+elif selection == "Italy":
+    #welcome
+    rain_emojis("🇮🇹")
+
+    #artist stats
+    st.write("#1 Trending Artist 🎤 Today")
+    container = st.container(border=True)
+    container.write(f"{top_artist_italy}")
+
+    #song stats
+    st.write("#1 Trending Song 🎵 Today")
+    container = st.container(border=True)
+    container.write(f"{top_song_italy}")
+
+    #explicit chart
+    st.plotly_chart(italy_pie)
+
+elif selection == "US":
+    #welcome
+    rain_emojis("🇺🇸")
+
+    #artist stats
+    st.write("#1 Trending Artist 🎤 Today")
+    container = st.container(border=True)
+    container.write(f"{top_artist_us}")
+
+    #song stats
+    st.write("#1 Trending Song 🎵 Today")
+    container = st.container(border=True)
+    container.write(f"{top_song_us}")
+
+    #explicit chart
+    st.plotly_chart(us_pie)
@@ -0,0 +1,6 @@
+Question: What type of data loading will you use? Why? Explain as Markdown in your repository.
+
+Answer: We are using truncate load because we are retrieving the entire dataset from BQ every time we load the data. 
+Since our project is not comparing music trends over time (at this current version), we are not interested in keeping old versions of our data stored.
+Additionally, since we're interested in showing what's popular right now in each country, we only care about pulling the most recent data.
+Therefore, we decided to use truncate load because we want to replace our dataset with the newest data each time we load the app.