This project aims to analyze a dataset of game releases, exploring various aspects such as data exploration, time series analysis, rating analysis, reviews analysis, and visualizations. The project utilizes pandas and other Python libraries for effective data manipulation and analysis.
The project can be divided into the following steps:
-
Data Exploration: Read the CSV file into a pandas DataFrame, handle data type conversions, and calculate basic statistics for numerical columns. Explore variable distributions using histograms, box plots, or density plots.
-
Time Series Analysis: Convert the "release date" column to a datetime data type. Analyze the number of game releases over time using line plots or bar plots. Investigate the relationship between release date and other variables through scatter plots or line plots.
-
Rating Analysis: Analyze the distribution of ratings using histograms or box plots. Calculate average ratings for different categories of games or time periods. Identify games with the highest and lowest ratings based on user reviews. Explore the correlation between ratings and other variables using scatter plots or correlation analysis.
-
Reviews Analysis: Calculate the percentage of positive and negative reviews based on the "positive reviews" and "negative reviews" columns. Investigate the relationship between reviews and other variables using scatter plots or correlation analysis. Identify games with the most positive or negative reviews. Perform sentiment analysis on reviews using natural language processing techniques.
-
Visualizations: Create bar plots or pie charts to visualize the distribution of game genres or categories. Generate line plots or area plots to show trends in peak players or total reviews over time. Use scatter plots to visualize relationships between variables such as ratings and peak players.