This repository contains an exploratory data analysis (EDA) of a medical insurance dataset. The goal is to investigate how various demographic and health-related factors such as age, BMI, smoking habits, gender, and diabetes status influence the insurance claim amounts.
This project includes both Python and R-based analysis, enabling cross-platform exploration and reproducibility of results using tools like Jupyter Notebooks and R Markdown.
medical_expenses.csv– Cleaned dataset used for analysis1_data_preprocessing.ipynb– Handling missing values and outliers (Python)2_eda.ipynb– Detailed visual EDA with insights (Python)Insurance EDA.py– Python script version of the analysisinsurance_data.docx– Summary, recommendations, and discussion questionsmedica_expense.Rmd&medica_expense.html– R Markdown notebook and rendered HTML output for EDA using ggplot2 and dplyr
The dataset includes the following columns:
age: Age of the policyholdergender: Male/Femalebmi: Body Mass Indexbloodpressure: Blood pressure leveldiabetic: Yes/Nochildren: Number of children covered by insurancesmoker: Smoking statusregion: Residential regionclaim: Insurance claim amount
- Age & Claim: Older individuals tend to have higher claims.
- BMI & Smoking: Smokers and people with higher BMI incur higher medical expenses.
- Regional Trends: Southeast region shows higher average claim amounts.
- Diabetes & Risk: Diabetic individuals have distinct trends in BMI and claim distribution.
- Outlier Treatment: Handled outliers in BMI and filled missing values in age and region.
- Tailored Insurance Plans: Customize offerings based on demographic segments.
- Health Incentives: Encourage wellness programs in high-risk areas.
- Risk Assessment Models: Use data insights to improve pricing and risk models.
- Policyholder Education: Help customers understand how lifestyle affects insurance cost.
See
insurance_data.docxfor detailed conclusions and discussion questions.
-
Clone the repo
git clone https://github.com/yourusername/medical-expense-analysis.git cd medical-expense-analysis -
Install required Python libraries
pip install pandas numpy matplotlib seaborn
-
Run the Jupyter notebooks or Python script.
-
To run the R analysis, open
medica_expense.Rmdin RStudio and knit to HTML.
Saurabh Singh Bhandari
Data Science Enthusiast | EDA & Automation Specialist
This project is for educational and portfolio purposes only. No commercial use allowed without permission.