🛡️ Insurance Claims Prediction Pipeline

Overview

This project productionizes a machine learning model that predicts the likelihood of insurance claims. It supports:

Daily batch predictions for new applications (~1200 per day)
Monthly retraining on fresh data from the data warehouse
A FastAPI interface to manually trigger and view predictions
Full CI/CD pipeline, Dockerization, and Kubernetes CronJobs
Deployment on Google Cloud Platform (GCP)

Technologies Used

Python
FastAPI for REST API
Pytest for unit testing
Docker for containerization
GitHub Actions for CI/CD
Kubernetes CronJobs for job scheduling and deployment
Google Cloud Platform (GCP) for cloud infrastructure

Functionality

Daily Predictions

Automatically triggered each day using Kubernetes CronJob
Loads daily data from the provided functions
Applies preprocessing and runs the trained model
Stores predictions and confidence scores in a structured format such as CSV (optionally a database in upcoming versions)

Monthly Retraining

Automatically triggered monthly
Loads updated data from the provided functions
Retrains model with same pipeline logic
Saves updated model to local (optionally registered with MLflow in upcoming versions)

FastAPI Interface

Allows for single instance prediction testing using UI
Supports visualization of prediction output of daily predictions

Assumptions

Daily application data is available in the warehouse in the following day 1 AM
Data includes a timestamp column to distinguish batches
Model training logic does not need enhancement at this stage
Same preprocessing pipeline is used consistently for inference and training (Assuming the data consistent over the time)
Output format is structured for downstream use

Business Considerations

Supports underwriters in risk assessment by providing early claim likelihood
Enables proactive customer handling during the cooling-off period
Ensures retraining frequency balances performance and stability
Predictions and retraining need to be automated, stable, and interpretable
Data governance and quality are assumed to be handled upstream

CI/CD and Automation

GitHub Actions

Triggers on code push and pull requests to main
Runs unit tests and linting
Builds and optionally pushes Docker image to registry
Deploys to GCK via GitHub secrets and service account credentials

Kubernetes CronJobs

Daily prediction job scheduled in the following day 1 AM UTC
Monthly retraining job scheduled on the first day of each month
Secrets and configs managed using Kubernetes Secrets or environment variables

Docker and Deployment

Dockerfile creates a lightweight container for the full pipeline
Containers deployed on GCP Kubernetes Engine
Jobs triggered using K8s CronJobs
Optionally integrates with Artifact Registry to store the docker image

Local Development Workflow

Create virtual environment and install dependencies

python3 -m venv env  
source env/bin/activate  
pip install -r requirements.txt

Run FastAPI application for local API testing

uvicorn main:app --reload

Run prediction and retraining scripts independently

Daily prediction:

python jobs/daily_run.py

Monthly retraining:

python jobs/monthly_run.py

Execute unit tests for core modules and utilities

pytest tests/

Further Improvements

Replace Kubernetes CronJobs with Apache Airflow DAGs or Cloud Functions for more robust, scalable, and observable orchestration.
Use MLflow to track and register models instead of saving to local storage for better model versioning and transparency.
Add model monitoring and data drift detection to ensure model reliability over time and trigger alerts or retraining when necessary.
Save prediction results directly into a cloud database or table (e.g., BigQuery, CloudSQL) instead of flat CSV files for better integration with downstream processes.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
artifacts/runs/default_run		artifacts/runs/default_run
backend		backend
config		config
daily_predictions		daily_predictions
jobs		jobs
k8s		k8s
notebooks		notebooks
src		src
templates		templates
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
hiscox.pptx		hiscox.pptx
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🛡️ Insurance Claims Prediction Pipeline

Overview

Technologies Used

Functionality

Daily Predictions

Monthly Retraining

FastAPI Interface

Assumptions

Business Considerations

CI/CD and Automation

GitHub Actions

Kubernetes CronJobs

Docker and Deployment

Local Development Workflow

Create virtual environment and install dependencies

Run FastAPI application for local API testing

Run prediction and retraining scripts independently

Execute unit tests for core modules and utilities

Further Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

manikrishna-m/claim_predictions

Folders and files

Latest commit

History

Repository files navigation

🛡️ Insurance Claims Prediction Pipeline

Overview

Technologies Used

Functionality

Daily Predictions

Monthly Retraining

FastAPI Interface

Assumptions

Business Considerations

CI/CD and Automation

GitHub Actions

Kubernetes CronJobs

Docker and Deployment

Local Development Workflow

Create virtual environment and install dependencies

Run FastAPI application for local API testing

Run prediction and retraining scripts independently

Execute unit tests for core modules and utilities

Further Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages