This project explores the relationship between game price, user ratings, and sales performance using Steam data.
The primary objective is to understand which factors significantly influence game ownership levels.
The dataset includes:
- Price
- User rating (percentage format, cleaned using regex)
- Owners range (converted to midpoint values)
- Release month
Sales data was transformed using log transformation due to high right-skewness.
- Extracted percentage values from rating strings (e.g., "N/A (N/A/94%)")
- Converted ownership ranges into numeric midpoint values
- Applied log transformation to sales data to reduce skewness
- Removed missing and invalid observations
Scatter plot analysis revealed:
- Highly right-skewed sales distribution
- Weak but positive association between price and sales
- Presence of extreme outliers (blockbuster titles)
After log transformation, the relationship became more interpretable.
An OLS regression model was estimated:
log_sales ~ price + rating
- Model statistically significant (p < 0.001)
- R² = 0.12
- Price has a positive and statistically significant effect on sales
- Rating has a positive and statistically significant effect on sales
Results suggest that higher-priced games may reflect stronger brand value, production scale, or perceived quality rather than simple price-driven demand dynamics.
- Python
- Pandas
- NumPy
- Matplotlib
- Statsmodels
Game sales on Steam are influenced by price and user ratings; however, a large portion of variation remains unexplained, indicating that additional market factors play an important role.