Skip to content

Turning messy data into meaningful insights — my journey through cleaning, visualization, modeling, and storytelling.

Notifications You must be signed in to change notification settings

xhrivas/data-analytics-portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📊 Data Analytics Portfolio

Welcome to my Data Analytics Portfolio!
This repository showcases my end-to-end data analytics skills across four major areas:

  1. Data Cleaning
  2. Data Visualization
  3. Data Modeling
  4. Data Storytelling

Each folder contains real-world datasets, Jupyter Notebooks, and documentation (READMEs) demonstrating specific analytics techniques.


🔹 Projects

🧹 Data Cleaning

Projects focused on transforming messy, raw data into structured, analysis-ready datasets.

  • Coffee Survey Project

    • Cleaned a survey of 1,000+ coffee drinkers.
    • Subsetted relevant columns, standardized names, and prepared data for further analysis.
    • Skills: Pandas (column selection, renaming, handling missing values).
  • Flight Delays Project

    • Cleaned US flight data and passenger logs.
    • Combined multiple data sources, handled inconsistent formats, and prepared for visualization.
    • Skills: Data merging, date-time cleaning, missing value handling.
  • Top Songs Project

    • Cleaned Spotify/Top Song duration data.
    • Standardized features and formatted dataset for time-series/EDA.
    • Skills: Pandas transformations, CSV handling.

📈 Data Visualization

Projects focused on turning data into clear, impactful visuals.

  • How Couples Met Project

    • Explored survey data from the How Couples Meet and Stay Together 2017 study.
    • Visualized how dating trends have shifted over time (e.g., rise of online dating).
    • Skills: Trend visualization, categorical comparisons, storytelling with social data.
  • Granola & Healthy Food Perceptions Project

    • Based on a New York Times survey comparing what Americans vs. nutritionists consider “healthy.”
    • Visualized perception gaps across multiple foods (e.g., sushi, granola, chocolate).
    • Skills: Comparative visualization, survey data analysis, group comparison plots.

🤖 Data Modeling

Projects focused on building predictive/statistical models and evaluating their performance.

  • Tusked Elephant Project

    • Dataset: Illegal tusk harvest and the decline of tusk size in the African elephant (Chiyo et al., 2015).
    • Compared elephant tusk lengths before (1966–68) and after (2005–13) widespread poaching.
    • Built linear regression models for each period to quantify changes in the relationship between shoulder height and tusk length.
    • Found a significant decline in tusk size relative to body size post-poaching, suggesting evolutionary pressures.
    • Skills: Pandas, Matplotlib, Linear Regression (Scikit-learn), model evaluation (R²), comparative modeling, data storytelling through science.
  • Lion Attacks Project

    • Dataset: Lion attacks & belly sizes vs evening moonlight.
    • Built custom Python linear models to explore how moonlight affects lion behavior.
    • Modeled both number of attacks and belly size, visualized relationships with scatter plots, and evaluated model performance using R².
    • Skills: Pandas, Matplotlib, Linear Regression, Custom Python Classes
  • E-Bike Stopping Distances Project

    • Dataset: ebike-stopping-distances.csv, with supplemental low- and high-speed data.
    • Modeled the relationship between e-bike speeds and stopping distances.
    • Started with a linear regression model, then fit a quadratic model to capture the curved relationship suggested by physics.
    • Visualized predictions and validated the quadratic model against high-speed data for accuracy.
    • Skills: Pandas, Matplotlib, Linear Regression, Polynomial Regression, Model Evaluation (R²), Data Visualization.

📝 Data Storytelling

Projects highlighting business insights and narratives

  • Plotting Animal Words Project

    • Dataset: animal-word-trends.csv – tracks historical usage frequency of animal words in printed materials from 1700 to 2019.
    • Built reusable plotting functions to visualize word usage trends over time for different animals.
    • Explored historical events (e.g., invention of the automobile, completion of the transcontinental railway) to form hypotheses about changes in word usage.
    • Investigated explanatory mechanisms for observed patterns, such as increased popularity of lobster following railway expansion.
    • Skills: Pandas, Matplotlib, Data Storytelling, Exploratory Analysis, Hypothesis Formation, Historical Data Interpretation.
  • Any Animal Except… Project

    • Dataset: animal-word-trends-menageries.csv – tracks historical usage frequency of exotic animal words in British menageries from 1750 to 1835.
    • Investigated how public exhibitions in menageries influenced word usage trends for exotic animals.
    • Compared trends for animals that were successfully exhibited (e.g., tiger, hyena, tapir) versus difficult-to-keep animals (e.g., hummingbird, chimpanzee, meerkat, penguin).
    • Tested hypotheses on cultural exposure and accessibility, and extended analysis to Australian animals like kangaroo, wombat, and koala.
    • Skills: Pandas, Matplotlib, Data Storytelling, Hypothesis Testing, Historical Data Interpretation, Exploratory Analysis.

⚙️ Tools & Technologies

  • Languages: Python (Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn)
  • Environments: Jupyter Notebook
  • Version Control: Git & GitHub
  • Other Skills: Data cleaning, exploratory data analysis (EDA), visualization, reproducibility, storytelling

🚀 About This Portfolio

This repository is structured to reflect the full data analytics workflow:

  • Start → clean raw data
  • Explore → visualize trends
  • Build → model predictions
  • Deliver → storytell insights

Each project demonstrates one stage of the pipeline, showing both technical ability and professional presentation.


📌 How to Use This Repo

  1. Clone/download the repo.
  2. Navigate to each project folder.
  3. Open .ipynb notebooks to view the workflow.
  4. Check README files for project descriptions.

👩‍💻 About Me

I’m Bhumika Shrivas, a data enthusiast passionate about using analytics to solve real-world problems.

  • 🎯 Seeking opportunities in Data Analytics / Business Analytics / Data Science.
  • 💡 I enjoy cleaning messy data, uncovering insights, and telling stories with numbers.

About

Turning messy data into meaningful insights — my journey through cleaning, visualization, modeling, and storytelling.

Topics

Resources

Stars

Watchers

Forks