A machine learning project using Linear Regression to predict health insurance expenses based on personal and lifestyle data. Built using TensorFlow 2.x and trained on real-world data from insurance.csv.
The dataset contains the following features:
ageβ Age of primary beneficiarysexβ Gender (male,female)bmiβ Body mass indexchildrenβ Number of dependentssmokerβ Whether the person smokes (yes,no)regionβ Residential area in the US (northeast,northwest, etc.)expensesβ Medical costs billed by health insurance
- One-hot encoding applied to:
sex,smoker, andregion(withdrop_first=Trueto avoid dummy variable trap)
expensescolumn popped as target variable- Train-test split:
80%training /20%testing StandardScalerused to normalize feature columns
Built using TensorFlow Keras Sequential API:
Dense(256)β ReLUDropout(0.1)Dense(128)β ReLUDropout(0.1)Dense(64)β ReLUDense(1)β Output layer (regression)
Compiled with:
- Loss:
Mean Squared Error (MSE) - Optimizer:
Adam - Metrics:
Mean Absolute Error (MAE)
EarlyStopping used to prevent overfitting.
- Evaluated on unseen test set
- Achieved MAE < 3500, passing the freeCodeCamp challenge β
Example output:
- Load the notebook in Google Colab
- Run all cells (training will auto-start)
- Final cell evaluates the model and displays predictions vs true values on a scatter plot
Train a regression model that can predict healthcare costs within a $3500 error margin on new, unseen data. Mission accomplished.
