Skip to content

ladypluvia/2025-steam-game-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Steam Game Sales Analysis

Project Overview

This project explores the relationship between game price, user ratings, and sales performance using Steam data.

The primary objective is to understand which factors significantly influence game ownership levels.


Dataset

The dataset includes:

  • Price
  • User rating (percentage format, cleaned using regex)
  • Owners range (converted to midpoint values)
  • Release month

Sales data was transformed using log transformation due to high right-skewness.


Data Cleaning

  • Extracted percentage values from rating strings (e.g., "N/A (N/A/94%)")
  • Converted ownership ranges into numeric midpoint values
  • Applied log transformation to sales data to reduce skewness
  • Removed missing and invalid observations

Exploratory Data Analysis

Scatter plot analysis revealed:

  • Highly right-skewed sales distribution
  • Weak but positive association between price and sales
  • Presence of extreme outliers (blockbuster titles)

After log transformation, the relationship became more interpretable.


Regression Analysis

An OLS regression model was estimated:

log_sales ~ price + rating

Key Results

  • Model statistically significant (p < 0.001)
  • R² = 0.12
  • Price has a positive and statistically significant effect on sales
  • Rating has a positive and statistically significant effect on sales

Interpretation

Results suggest that higher-priced games may reflect stronger brand value, production scale, or perceived quality rather than simple price-driven demand dynamics.


Tools Used

  • Python
  • Pandas
  • NumPy
  • Matplotlib
  • Statsmodels

Conclusion

Game sales on Steam are influenced by price and user ratings; however, a large portion of variation remains unexplained, indicating that additional market factors play an important role.

About

Web scraping, EDA, and statistical modeling (ANOVA & OLS Regression) of 2025 Steam game sales data using Python.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors