Skip to content

Data analysis and visualization in R using real-world datasets (Video Game Sales, Seoul Bike Sharing, Call Quality).

Notifications You must be signed in to change notification settings

Mahak0747/Big-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

📊 Big Data Analysis

This repository contains multiple data analysis projects implemented in R. The projects focus on extracting insights from real-world datasets using data visualization and statistical techniques.


📁 Repository Structure

  • Big Data Analysis Using R.R → Main R script containing analysis for multiple datasets.
  • Big Data Analysis Using R.docx → Documentation of the analysis and results.
  • README.md → Project overview (this file).

🔑 Key Analyses

1. 🎮 Video Game Sales Analysis

  • Dataset: vgsales.csv
  • Objective: Identify sales distribution across genres and find the highest selling game genre.
  • Techniques Used:
    • Pie chart visualization of global sales by genre
    • Percentage contribution analysis

2. 🚴‍♂️ Seoul Bike Sharing Demand

  • Dataset: SeoulBikeData.csv
  • Objectives:
    • Analyze seasonal and monthly demand for bike rentals
    • Study relationships with weather conditions
  • Techniques Used:
    • Line charts for bike count and temperature trends
    • Pie chart of bike rentals by season
    • Scatter plots for bike count vs temperature and bike count vs rainfall
    • Bar charts for holiday vs non-holiday usage

3. 📱 Call Voice Quality Analysis

  • Datasets:
    • CallVoiceQualityExperience-2018-April.csv
    • CallVoiceQuality_Data_2018_May.csv
  • Objectives:
    • Evaluate call quality across states, operators, network types, and conditions (indoor/outdoor/travelling).
  • Techniques Used:
    • Bar charts for operator quality ratings, state-wise performance, and network type analysis
    • Heatmap of state vs network type ratings
    • Horizontal bar charts for call drop categories

🛠️ Technologies Used

  • R Language
  • Libraries:
    • ggplot2 → Data visualization
    • dplyr → Data manipulation
    • lubridate → Date handling
    • reshape2 → Data reshaping

🚀 How to Run

  1. Clone the repository:
    git clone https://github.com/Mahak0747/Big-Data-Analysis.git
    cd Big-Data-Analysis
  2. Open Big Data Analysis Using R.R in RStudio or run in R console.
  3. Make sure the datasets (vgsales.csv, SeoulBikeData.csv, CallVoiceQualityExperience-2018-April.csv, CallVoiceQuality_Data_2018_May.csv) are available in your working directory. Update the file paths in the script if needed.
  4. Install required libraries if not already installed:
    install.packages(c("ggplot2", "dplyr", "lubridate", "reshape2"))
    

📈 Sample Visualizations

  • Video game sales by genre

    Image

  • Bike rentals vs temperature

    Image

  • Call quality ratings across states

    Image

    Image

  • Heatmap of call quality (State × Network Type)

    Image


📌 Author

👩‍💻 Mahak Goswam

CSE (AI) Student

Interests: Data Analysis, Machine Learning, Game Development

About

Data analysis and visualization in R using real-world datasets (Video Game Sales, Seoul Bike Sharing, Call Quality).

Topics

Resources

Stars

Watchers

Forks

Languages