Production-style feature store that ingests events, computes features, serves online values, stores offline history, and monitors drift between baseline and current distributions.
Use this project to:
- Register and govern ML features (ownership, TTL, data type).
- Ingest feature values in real time and serve them with low latency.
- Store offline feature history for training and audits.
- Detect drift using PSI and surface anomalies in a dashboard.
Events -> Kinesis -> Lambda -> DynamoDB (online)
-> S3 (offline)
-> Drift metrics (DynamoDB)
API Gateway -> Lambda -> DynamoDB
Frontend -> API Gateway -> Lambda -> DynamoDB
- API (Node/Express or Lambda-backed): feature CRUD, ingestion, entity lookups, drift metrics.
- Frontend (React/Vite): registry UI, drift dashboard, entity features.
- Compute (Python): feature aggregation + PSI-based drift detection.
- Model service (FastAPI, optional): fetches features and returns mock predictions.
- Infrastructure (AWS CDK): DynamoDB tables, Kinesis, S3, API Gateway, Lambdas.
- Install dependencies
cd api
npm install
cd ../frontend-react
npm install
- Run services
cd api
npm run dev
cd ../frontend-react
npm run dev
- Open the UI
http://localhost:3001
Local API base URL is http://localhost:3000/api. The frontend uses Vite proxy for /api by default.
Create a feature:
curl -X POST http://localhost:3000/api/features \
-H "Content-Type: application/json" \
-d '{
"featureName": "user_age",
"entityType": "user",
"dtype": "numeric",
"ttl": 86400,
"owner": "data-team",
"description": "User age",
"tags": ["demo"]
}'
Ingest feature values:
curl -X POST http://localhost:3000/api/features/user_age/values \
-H "Content-Type: application/json" \
-d '{
"values": [
{ "entityId": "user_123", "value": 25, "ttl": 86400 }
]
}'
Query online features for an entity:
curl http://localhost:3000/api/entities/user_123/features
List drift metrics:
curl http://localhost:3000/api/drift
Drift is computed with PSI (Population Stability Index). Metrics are stored in DynamoDB and surfaced in the dashboard:
GET /driftlists metrics and anomalies.GET /drift/anomalies/currentreturns active anomalies.
The Python compute engine (feature-compute/) contains the PSI logic and a drift pipeline that can be wired to scheduled jobs or event streams.
The frontend reads the API base URL from VITE_API_URL at build time.
Example:
VITE_API_URL=https://your-api-id.execute-api.us-east-1.amazonaws.com/dev
If not set, it defaults to /api for local development with Vite proxy.
- Deploy infrastructure
cd infrastructure
npm install
npm run build
npx aws-cdk@latest bootstrap
npx aws-cdk@latest deploy --all
- Build and publish frontend to S3
cd frontend-react
VITE_API_URL=https://your-api-id.execute-api.us-east-1.amazonaws.com/dev npm run build
aws s3 sync dist/ s3://your-frontend-bucket --delete
- Use the API Gateway URL from CloudFormation outputs as the frontend base URL.
API (api/.env):
PORT=3000
AWS_REGION=us-east-1
FEATURE_REGISTRY_TABLE=FeatureRegistry
ONLINE_FEATURE_TABLE=OnlineFeatureValues
DRIFT_METRICS_TABLE=DriftMetrics
KINESIS_STREAM=feature-events
OFFLINE_FEATURE_BUCKET=feature-store-offline-store
Model service (model-service/.env):
AWS_REGION=us-east-1
ONLINE_FEATURE_TABLE=OnlineFeatureValues
- DynamoDB, Kinesis, S3, and API Gateway are provisioned via CDK.
- Lambda handlers implement feature CRUD, ingestion, and drift queries.
- The drift pipeline is implemented in Python and can be scheduled or wired to a stream.
For deeper reference, see docs/API.md, docs/DEPLOYMENT.md, and ARCHITECTURE.md.