Welcome to my Data Analytics Portfolio!
This repository showcases my end-to-end data analytics skills across four major areas:
- Data Cleaning
- Data Visualization
- Data Modeling
- Data Storytelling
Each folder contains real-world datasets, Jupyter Notebooks, and documentation (READMEs) demonstrating specific analytics techniques.
Projects focused on transforming messy, raw data into structured, analysis-ready datasets.
-
Coffee Survey Project
- Cleaned a survey of 1,000+ coffee drinkers.
- Subsetted relevant columns, standardized names, and prepared data for further analysis.
- Skills: Pandas (column selection, renaming, handling missing values).
-
Flight Delays Project
- Cleaned US flight data and passenger logs.
- Combined multiple data sources, handled inconsistent formats, and prepared for visualization.
- Skills: Data merging, date-time cleaning, missing value handling.
-
Top Songs Project
- Cleaned Spotify/Top Song duration data.
- Standardized features and formatted dataset for time-series/EDA.
- Skills: Pandas transformations, CSV handling.
Projects focused on turning data into clear, impactful visuals.
-
How Couples Met Project
- Explored survey data from the How Couples Meet and Stay Together 2017 study.
- Visualized how dating trends have shifted over time (e.g., rise of online dating).
- Skills: Trend visualization, categorical comparisons, storytelling with social data.
-
Granola & Healthy Food Perceptions Project
- Based on a New York Times survey comparing what Americans vs. nutritionists consider “healthy.”
- Visualized perception gaps across multiple foods (e.g., sushi, granola, chocolate).
- Skills: Comparative visualization, survey data analysis, group comparison plots.
Projects focused on building predictive/statistical models and evaluating their performance.
-
Tusked Elephant Project
- Dataset: Illegal tusk harvest and the decline of tusk size in the African elephant (Chiyo et al., 2015).
- Compared elephant tusk lengths before (1966–68) and after (2005–13) widespread poaching.
- Built linear regression models for each period to quantify changes in the relationship between shoulder height and tusk length.
- Found a significant decline in tusk size relative to body size post-poaching, suggesting evolutionary pressures.
- Skills: Pandas, Matplotlib, Linear Regression (Scikit-learn), model evaluation (R²), comparative modeling, data storytelling through science.
-
Lion Attacks Project
- Dataset: Lion attacks & belly sizes vs evening moonlight.
- Built custom Python linear models to explore how moonlight affects lion behavior.
- Modeled both number of attacks and belly size, visualized relationships with scatter plots, and evaluated model performance using R².
- Skills: Pandas, Matplotlib, Linear Regression, Custom Python Classes
-
E-Bike Stopping Distances Project
- Dataset:
ebike-stopping-distances.csv
, with supplemental low- and high-speed data. - Modeled the relationship between e-bike speeds and stopping distances.
- Started with a linear regression model, then fit a quadratic model to capture the curved relationship suggested by physics.
- Visualized predictions and validated the quadratic model against high-speed data for accuracy.
- Skills: Pandas, Matplotlib, Linear Regression, Polynomial Regression, Model Evaluation (R²), Data Visualization.
- Dataset:
Projects highlighting business insights and narratives
-
Plotting Animal Words Project
- Dataset:
animal-word-trends.csv
– tracks historical usage frequency of animal words in printed materials from 1700 to 2019. - Built reusable plotting functions to visualize word usage trends over time for different animals.
- Explored historical events (e.g., invention of the automobile, completion of the transcontinental railway) to form hypotheses about changes in word usage.
- Investigated explanatory mechanisms for observed patterns, such as increased popularity of lobster following railway expansion.
- Skills: Pandas, Matplotlib, Data Storytelling, Exploratory Analysis, Hypothesis Formation, Historical Data Interpretation.
- Dataset:
-
Any Animal Except… Project
- Dataset:
animal-word-trends-menageries.csv
– tracks historical usage frequency of exotic animal words in British menageries from 1750 to 1835. - Investigated how public exhibitions in menageries influenced word usage trends for exotic animals.
- Compared trends for animals that were successfully exhibited (e.g., tiger, hyena, tapir) versus difficult-to-keep animals (e.g., hummingbird, chimpanzee, meerkat, penguin).
- Tested hypotheses on cultural exposure and accessibility, and extended analysis to Australian animals like kangaroo, wombat, and koala.
- Skills: Pandas, Matplotlib, Data Storytelling, Hypothesis Testing, Historical Data Interpretation, Exploratory Analysis.
- Dataset:
- Languages: Python (Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn)
- Environments: Jupyter Notebook
- Version Control: Git & GitHub
- Other Skills: Data cleaning, exploratory data analysis (EDA), visualization, reproducibility, storytelling
This repository is structured to reflect the full data analytics workflow:
- Start → clean raw data
- Explore → visualize trends
- Build → model predictions
- Deliver → storytell insights
Each project demonstrates one stage of the pipeline, showing both technical ability and professional presentation.
- Clone/download the repo.
- Navigate to each project folder.
- Open
.ipynb
notebooks to view the workflow. - Check README files for project descriptions.
I’m Bhumika Shrivas, a data enthusiast passionate about using analytics to solve real-world problems.
- 🎯 Seeking opportunities in Data Analytics / Business Analytics / Data Science.
- 💡 I enjoy cleaning messy data, uncovering insights, and telling stories with numbers.