Skip to content

noahspahn/Real-Time-Feature-Store-Drift-Monitor

Repository files navigation

Real-Time Feature Store + Drift Monitor

Production-style feature store that ingests events, computes features, serves online values, stores offline history, and monitors drift between baseline and current distributions.

What this tool is for

Use this project to:

  • Register and govern ML features (ownership, TTL, data type).
  • Ingest feature values in real time and serve them with low latency.
  • Store offline feature history for training and audits.
  • Detect drift using PSI and surface anomalies in a dashboard.

How it works (data flow)

Events -> Kinesis -> Lambda -> DynamoDB (online)
                     -> S3 (offline)
                     -> Drift metrics (DynamoDB)

API Gateway -> Lambda -> DynamoDB
Frontend -> API Gateway -> Lambda -> DynamoDB

Core components

  • API (Node/Express or Lambda-backed): feature CRUD, ingestion, entity lookups, drift metrics.
  • Frontend (React/Vite): registry UI, drift dashboard, entity features.
  • Compute (Python): feature aggregation + PSI-based drift detection.
  • Model service (FastAPI, optional): fetches features and returns mock predictions.
  • Infrastructure (AWS CDK): DynamoDB tables, Kinesis, S3, API Gateway, Lambdas.

Quick start (local)

  1. Install dependencies
cd api
npm install

cd ../frontend-react
npm install
  1. Run services
cd api
npm run dev

cd ../frontend-react
npm run dev
  1. Open the UI
http://localhost:3001

Local API base URL is http://localhost:3000/api. The frontend uses Vite proxy for /api by default.

API usage (examples)

Create a feature:

curl -X POST http://localhost:3000/api/features \
  -H "Content-Type: application/json" \
  -d '{
    "featureName": "user_age",
    "entityType": "user",
    "dtype": "numeric",
    "ttl": 86400,
    "owner": "data-team",
    "description": "User age",
    "tags": ["demo"]
  }'

Ingest feature values:

curl -X POST http://localhost:3000/api/features/user_age/values \
  -H "Content-Type: application/json" \
  -d '{
    "values": [
      { "entityId": "user_123", "value": 25, "ttl": 86400 }
    ]
  }'

Query online features for an entity:

curl http://localhost:3000/api/entities/user_123/features

List drift metrics:

curl http://localhost:3000/api/drift

Drift monitoring

Drift is computed with PSI (Population Stability Index). Metrics are stored in DynamoDB and surfaced in the dashboard:

  • GET /drift lists metrics and anomalies.
  • GET /drift/anomalies/current returns active anomalies.

The Python compute engine (feature-compute/) contains the PSI logic and a drift pipeline that can be wired to scheduled jobs or event streams.

Frontend configuration

The frontend reads the API base URL from VITE_API_URL at build time.

Example:

VITE_API_URL=https://your-api-id.execute-api.us-east-1.amazonaws.com/dev

If not set, it defaults to /api for local development with Vite proxy.

AWS deployment overview

  1. Deploy infrastructure
cd infrastructure
npm install
npm run build
npx aws-cdk@latest bootstrap
npx aws-cdk@latest deploy --all
  1. Build and publish frontend to S3
cd frontend-react
VITE_API_URL=https://your-api-id.execute-api.us-east-1.amazonaws.com/dev npm run build
aws s3 sync dist/ s3://your-frontend-bucket --delete
  1. Use the API Gateway URL from CloudFormation outputs as the frontend base URL.

Configuration

API (api/.env):

PORT=3000
AWS_REGION=us-east-1
FEATURE_REGISTRY_TABLE=FeatureRegistry
ONLINE_FEATURE_TABLE=OnlineFeatureValues
DRIFT_METRICS_TABLE=DriftMetrics
KINESIS_STREAM=feature-events
OFFLINE_FEATURE_BUCKET=feature-store-offline-store

Model service (model-service/.env):

AWS_REGION=us-east-1
ONLINE_FEATURE_TABLE=OnlineFeatureValues

Notes and current status

  • DynamoDB, Kinesis, S3, and API Gateway are provisioned via CDK.
  • Lambda handlers implement feature CRUD, ingestion, and drift queries.
  • The drift pipeline is implemented in Python and can be scheduled or wired to a stream.

For deeper reference, see docs/API.md, docs/DEPLOYMENT.md, and ARCHITECTURE.md.

About

A feature store that ingests events, computes features, serves online values, stores offline parquet, and monitors training/serving skew.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors