Skip to content

Commit ffdfc0f

Browse files
committed
Refactor code for consistency and readability
- Updated string formatting to use consistent double quotes in version handling. - Reformatted code for better readability, including line breaks and indentation. - Removed unnecessary blank lines and comments to clean up the code. - Improved the organization of imports and removed unused ones. - Enhanced error messages for clarity in version retrieval functions. - Ensured consistent use of whitespace around operators and after commas.
1 parent 5c98d48 commit ffdfc0f

File tree

13 files changed

+567
-381
lines changed

13 files changed

+567
-381
lines changed

.flake8

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[flake8]
2+
max-line-length = 120
3+
ignore = E402,E302,E305,E266,E203,W503,W504,E722,E712,E721,E713,E714,E731

.github/workflows/autoformat.yml

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
name: Autoformat Code on Push
2+
3+
on:
4+
push:
5+
branches:
6+
- main # Adjust the branch accordingly
7+
pull_request:
8+
branches:
9+
- main # Adjust the branch accordingly
10+
11+
permissions:
12+
checks: write
13+
actions: read
14+
contents: write
15+
16+
jobs:
17+
format:
18+
runs-on: ubuntu-latest
19+
20+
env:
21+
commit_message: "No formatting changes applied"
22+
23+
steps:
24+
- name: Checkout code
25+
uses: actions/checkout@v4
26+
with:
27+
token: ${{ secrets.GITHUB_TOKEN }} # Use GitHub token to push changes
28+
29+
- name: Set up Python
30+
uses: actions/setup-python@v5
31+
with:
32+
python-version: '3.12' # Adjust to your Python version
33+
34+
- name: Install dependencies
35+
run: |
36+
python -m pip install --upgrade pip
37+
pip install black black[jupyter] flake8 isort nbstripout pytest pytest-timeout versioneer
38+
39+
- name: Check import sorting with isort
40+
id: isort-check
41+
run: |
42+
isort --check-only .
43+
continue-on-error: true
44+
45+
- name: Format imports with isort
46+
if: steps.isort-check.outcome == 'failure'
47+
run: |
48+
isort .
49+
50+
- name: Check code formatting with Black
51+
id: black-check
52+
run: |
53+
black --line-length=120 --preview --enable-unstable-feature=string_processing --check .
54+
continue-on-error: true
55+
56+
- name: Format code with Black
57+
if: steps.black-check.outcome == 'failure'
58+
run: |
59+
black --line-length=120 --preview --enable-unstable-feature=string_processing .
60+
61+
- name: Set commit message
62+
id: set-message
63+
run: |
64+
if [[ "${{ steps.isort-check.outcome }}" == "failure" && "${{ steps.black-check.outcome }}" == "failure" ]]; then
65+
echo "commit_message=Sorted imports with isort & Autoformat code with Black" >> $GITHUB_ENV
66+
elif [[ "${{ steps.isort-check.outcome }}" == "failure" ]]; then
67+
echo "commit_message=Sorted imports with isort" >> $GITHUB_ENV
68+
elif [[ "${{ steps.black-check.outcome }}" == "failure" ]]; then
69+
echo "commit_message=Autoformat code with Black" >> $GITHUB_ENV
70+
fi
71+
72+
- name: Commit and push changes if formatting is applied
73+
if: steps.isort-check.outcome == 'failure' || steps.black-check.outcome == 'failure'
74+
run: |
75+
git config --local user.name "github-actions[bot]"
76+
git config --local user.email "github-actions[bot]@users.noreply.github.com"
77+
if [ -n "$(git status --porcelain)" ]; then
78+
git add .
79+
git commit -m "${{ env.commit_message }}"
80+
git push origin ${{ github.ref }}
81+
else
82+
echo "No changes to commit"
83+
fi

README.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,16 @@
11
# Steam Sales Analysis
2+
[![Package Publish Status](https://img.shields.io/github/actions/workflow/status/DataForgeOpenAIHub/Steam-Sales-Analysis/python-publish.yml?branch=main)](https://github.com/DataForgeOpenAIHub/Steam-Sales-Analysis/actions)
3+
[![PyPI Downloads](https://img.shields.io/pypi/dm/steamstore_etl)](https://pypi.org/project/steamstore_etl/)
4+
[![PyPI Python Version](https://img.shields.io/pypi/pyversions/steamstore_etl)](https://pypi.org/project/steamstore_etl/)
5+
[![PyPI version](https://img.shields.io/pypi/v/steamstore_etl.svg)](https://pypi.org/project/steamstore_etl/)
6+
![GitHub release (latest by date)](https://img.shields.io/github/v/release/DataForgeOpenAIHub/Steam-Sales-Analysis)
27

38
![banner](assets/imgs/steam_logo_banner.jpg)
49

510
## Overview
611
Welcome to **Steam Sales Analysis** – an innovative project designed to harness the power of data for insights into the gaming world. We have meticulously crafted an ETL (Extract, Transform, Load) pipeline that covers every essential step: data retrieval, processing, validation, and ingestion. By leveraging the robust Steamspy and Steam APIs, we collect comprehensive game-related metadata, details, and sales figures.
712

8-
But we don’t stop there. The culmination of this data journey sees the information elegantly loaded into a MySQL database hosted on Aiven Cloud. From this solid foundation, we take it a step further: the data is analyzed and visualized through dynamic and interactive Tableau dashboards. This transforms raw numbers into actionable insights, offering a clear window into gaming trends and sales performance. Join us as we dive deep into the data and bring the world of gaming to life!
13+
But we don’t stop there. The culmination of this data journey is the elegant loading of information into a MySQL database hosted on Aiven Cloud. From this solid foundation, we take it a step further: the data is analyzed and visualized through dynamic and interactive Tableau dashboards. This transforms raw numbers into actionable insights, offering a clear window into gaming trends and sales performance. Join us as we dive deep into the data and bring the world of gaming to life!
914

1015
# `steamstore` CLI
1116
![Steamstore ETL Pipeline](assets/imgs/steamstore-etl.drawio.png)
@@ -291,6 +296,14 @@ To execute the ETL pipeline, use the following commands:
291296
292297
This will start the process of retrieving data from the Steamspy and Steam APIs, processing and validating it, and then loading it into the MySQL database.
293298
299+
# Dashboard
300+
- Explore the interactive [**Tableau dashboard**](https://sudarshanasrao.github.io/portfolio/portfolio-0/).
301+
302+
## Authors
303+
1. [Kayvan Shah](https://github.com/KayvanShah1) | `MS in Applied Data Science` | `USC`
304+
2. [Sudarshana S Rao](https://github.com/SudarshanaSRao) | `MS in Electrical Engineering (Machine Learning & Data Science)` | `USC`
305+
3. [Rohit Veeradhi](https://github.com/Rohit04121998) | `MS in Electrical Engineering (Machine Learning & Data Science)` | `USC`
306+
294307
## References:
295308
296309
### API Used:

dag/flows/healthcheck.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
import platform
2-
import prefect
3-
from prefect import task, flow, get_run_logger
42
import sys
53

4+
import prefect
5+
from prefect import flow, get_run_logger, task
6+
67

78
@task
89
def log_platform_info():

get_version.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import warnings
2+
23
import versioneer
34

45
if __name__ == "__main__":

notebooks/data_exploration.ipynb

Lines changed: 45 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -339,9 +339,9 @@
339339
}
340340
],
341341
"source": [
342-
"with open(os.path.join(Path.sql_queries, 'get_all_game_data.sql'), \"r\") as f:\n",
342+
"with open(os.path.join(Path.sql_queries, \"get_all_game_data.sql\"), \"r\") as f:\n",
343343
" query = text(f.read())\n",
344-
" \n",
344+
"\n",
345345
"\n",
346346
"with get_db() as db:\n",
347347
" result = db.execute(query)\n",
@@ -371,7 +371,7 @@
371371
}
372372
],
373373
"source": [
374-
"game_data['description'].iloc[10000-4-1]"
374+
"game_data[\"description\"].iloc[10000 - 4 - 1]"
375375
]
376376
},
377377
{
@@ -396,6 +396,7 @@
396396
"source": [
397397
"from fuzzywuzzy import process\n",
398398
"\n",
399+
"\n",
399400
"def get_unique(series):\n",
400401
" \"\"\"\n",
401402
" Returns a set of unique values from a series of strings.\n",
@@ -407,7 +408,7 @@
407408
" set: A set of unique values extracted from the series.\n",
408409
"\n",
409410
" \"\"\"\n",
410-
" return set(list(itertools.chain(*series.apply(lambda x: [c for c in x.split(';')]))))"
411+
" return set(list(itertools.chain(*series.apply(lambda x: [c for c in x.split(\";\")]))))"
411412
]
412413
},
413414
{
@@ -461,7 +462,7 @@
461462
}
462463
],
463464
"source": [
464-
"geners = get_unique(game_data['genres'])\n",
465+
"geners = get_unique(game_data[\"genres\"])\n",
465466
"geners"
466467
]
467468
},
@@ -494,28 +495,30 @@
494495
"def standardize_genre(value, genre_list):\n",
495496
" # Convert to lowercase for consistent comparison\n",
496497
" value_lower = value.lower()\n",
497-
" \n",
498+
"\n",
498499
" # Define common patterns\n",
499-
" if 'rpg' in value_lower or 'role playing' in value_lower or 'role' in value_lower:\n",
500-
" return 'RPG'\n",
501-
" if 'simulation' in value_lower or 'simulators' in value_lower:\n",
502-
" return 'Simulation'\n",
503-
" if 'adventure' in value_lower:\n",
504-
" return 'Adventure'\n",
500+
" if \"rpg\" in value_lower or \"role playing\" in value_lower or \"role\" in value_lower:\n",
501+
" return \"RPG\"\n",
502+
" if \"simulation\" in value_lower or \"simulators\" in value_lower:\n",
503+
" return \"Simulation\"\n",
504+
" if \"adventure\" in value_lower:\n",
505+
" return \"Adventure\"\n",
506+
"\n",
505507
"\n",
506508
"# Function to standardize multiple genres\n",
507509
"def standardize_multiple_genres(genres_str, genre_list):\n",
508-
" genres = genres_str.split(';')\n",
510+
" genres = genres_str.split(\";\")\n",
509511
" standardized_genres = [standardize_genre(genre.strip(), genre_list) for genre in genres]\n",
510-
" return ';'.join(sorted(set(standardized_genres))) # Use sorted(set()) to remove duplicates and sort\n",
511-
" \n",
512+
" return \";\".join(sorted(set(standardized_genres))) # Use sorted(set()) to remove duplicates and sort\n",
513+
"\n",
512514
" # Find the best match from the list of unique genres\n",
513515
" match, score = process.extractOne(value, genre_list)\n",
514516
" return match\n",
515517
"\n",
518+
"\n",
516519
"# Apply the standardization function to the Genres column\n",
517-
"game_data['genres'] = game_data['genres'].apply(lambda x: standardize_multiple_genres(x, geners))\n",
518-
"geners = get_unique(game_data['genres'])\n",
520+
"game_data[\"genres\"] = game_data[\"genres\"].apply(lambda x: standardize_multiple_genres(x, geners))\n",
521+
"geners = get_unique(game_data[\"genres\"])\n",
519522
"geners"
520523
]
521524
},
@@ -615,7 +618,7 @@
615618
}
616619
],
617620
"source": [
618-
"categories = get_unique(game_data['categories'])\n",
621+
"categories = get_unique(game_data[\"categories\"])\n",
619622
"categories"
620623
]
621624
},
@@ -643,21 +646,22 @@
643646
" - score: The calculated rating score as a percentage.\n",
644647
"\n",
645648
" \"\"\"\n",
646-
" pos = row['positive_ratings']\n",
647-
" neg = row['negative_ratings']\n",
649+
" pos = row[\"positive_ratings\"]\n",
650+
" neg = row[\"negative_ratings\"]\n",
648651
"\n",
649652
" total_reviews = pos + neg\n",
650-
" \n",
653+
"\n",
651654
" if total_reviews > 0:\n",
652655
" average = pos / total_reviews\n",
653-
" score = average - (average * 0.5) * 2**(-math.log10(total_reviews + 1))\n",
656+
" score = average - (average * 0.5) * 2 ** (-math.log10(total_reviews + 1))\n",
654657
" return score * 100\n",
655658
" else:\n",
656659
" return 0.0\n",
657660
"\n",
658-
"game_data['total_ratings'] = game_data['positive_ratings'] + game_data['negative_ratings']\n",
659-
"game_data['review_score'] = game_data['positive_ratings'] / game_data['total_ratings']\n",
660-
"game_data['rating'] = game_data.apply(calc_rating, axis=1)"
661+
"\n",
662+
"game_data[\"total_ratings\"] = game_data[\"positive_ratings\"] + game_data[\"negative_ratings\"]\n",
663+
"game_data[\"review_score\"] = game_data[\"positive_ratings\"] / game_data[\"total_ratings\"]\n",
664+
"game_data[\"rating\"] = game_data.apply(calc_rating, axis=1)"
661665
]
662666
},
663667
{
@@ -996,24 +1000,25 @@
9961000
"source": [
9971001
"def categorize_year(year):\n",
9981002
" if year < 2020:\n",
999-
" return 'Before 2020'\n",
1003+
" return \"Before 2020\"\n",
10001004
" elif 2020 <= year <= 2022:\n",
1001-
" return '2020-2022'\n",
1005+
" return \"2020-2022\"\n",
10021006
" else:\n",
1003-
" return 'After 2022'\n",
1007+
" return \"After 2022\"\n",
1008+
"\n",
10041009
"\n",
1005-
"game_data['year'] = game_data['year'].fillna(0).astype(int) \n",
1006-
"game_data['Region'] = game_data['year'].apply(categorize_year)\n",
1010+
"game_data[\"year\"] = game_data[\"year\"].fillna(0).astype(int)\n",
1011+
"game_data[\"Region\"] = game_data[\"year\"].apply(categorize_year)\n",
10071012
"\n",
10081013
"# Calculate the frequency of each year\n",
1009-
"yearly_counts = game_data.groupby(['Region', 'year']).size().reset_index(name='Frequency')\n",
1014+
"yearly_counts = game_data.groupby([\"Region\", \"year\"]).size().reset_index(name=\"Frequency\")\n",
10101015
"\n",
10111016
"# Plotting using Seaborn\n",
10121017
"plt.figure(figsize=(12, 6))\n",
1013-
"sns.barplot(data=yearly_counts, x='year', y='Frequency', hue='Region')\n",
1014-
"plt.title('Game Release by Year')\n",
1015-
"plt.xlabel('Year')\n",
1016-
"plt.ylabel('Frequency')\n",
1018+
"sns.barplot(data=yearly_counts, x=\"year\", y=\"Frequency\", hue=\"Region\")\n",
1019+
"plt.title(\"Game Release by Year\")\n",
1020+
"plt.xlabel(\"Year\")\n",
1021+
"plt.ylabel(\"Frequency\")\n",
10171022
"plt.xticks(rotation=45)\n",
10181023
"plt.show()"
10191024
]
@@ -1031,12 +1036,12 @@
10311036
"metadata": {},
10321037
"outputs": [],
10331038
"source": [
1034-
"tags = col_row_df['tags']\n",
1039+
"tags = col_row_df[\"tags\"]\n",
10351040
"parsed_tags = tags.apply(lambda x: literal_eval(x) if x else {})\n",
10361041
"\n",
10371042
"unique_tags = set(itertools.chain(*parsed_tags))\n",
10381043
"\n",
1039-
"print('Number of unique tags:', len(unique_tags))\n",
1044+
"print(\"Number of unique tags:\", len(unique_tags))\n",
10401045
"\n",
10411046
"# Create a DataFrame with 15 columns and 30 rows\n",
10421047
"num_columns = 15\n",
@@ -1045,7 +1050,7 @@
10451050
"unique_tags = sorted(list(unique_tags))\n",
10461051
"\n",
10471052
"# Reshape the list into the desired DataFrame shape\n",
1048-
"ut = [unique_tags[i * num_columns:(i + 1) * num_columns] for i in range(num_rows)]\n",
1053+
"ut = [unique_tags[i * num_columns : (i + 1) * num_columns] for i in range(num_rows)]\n",
10491054
"\n",
10501055
"# Create the DataFrame\n",
10511056
"utdf = pd.DataFrame(ut)\n",
@@ -1079,8 +1084,8 @@
10791084
"metadata": {},
10801085
"outputs": [],
10811086
"source": [
1082-
"langs = col_row_df['languages']\n",
1083-
"langs = langs.apply(lambda x: x.split(', ') if x else [])\n",
1087+
"langs = col_row_df[\"languages\"]\n",
1088+
"langs = langs.apply(lambda x: x.split(\", \") if x else [])\n",
10841089
"\n",
10851090
"langc = Counter()\n",
10861091
"\n",

0 commit comments

Comments
 (0)