This guide covers how to set up and test JFrogML's Feature Store components. You'll register data sources, create feature sets, and validate the feature pipeline before building ML models.
- Install CLI:
pip install frogml-cli- Login and configure credentials (interactive):
frogml config add --interactiveRefer to the JFrog ML install and setup instructions: Install JFrog ML.
Understanding the Feature Store components:
.
├── feature_store/
│ ├── data_source.py # Data connector definition
│ └── feature_set.py # Feature transformations and scheduling
├── main/
│ └── utils.py # Data preprocessing utilities
feature_store/data_source.py: Defines connector to raw data (CSV from S3)feature_store/feature_set.py: Transforms raw data into features with Spark SQL, scheduling, and storagemain/utils.py: Data cleaning and preprocessing utilities
Before registration, test your data source locally:
# In a Python cell or script
from feature_store.data_source import csv_source
# Test data source connectivity and sample data
sample_data = csv_source.get_sample()
print(sample_data.head())This validates:
- S3 connectivity and access
- Data format and structure
- Column names and data types
Once validated, register the connector to your raw data:
# Register data source (run from feature_set_quickstart_guide/ directory)
frogml features register -p feature_store/data_source.pyWhat this does:
- Creates connection to S3 CSV data
- Defines data access configuration
- Makes raw data available to Feature Store
Data Source Configuration:
csv_source = CsvSource(
name='credit_risk_data',
path='s3://qwak-public/example_data/data_credit_risk.csv',
date_created_column='date_created',
filesystem_configuration=AnonymousS3Configuration(),
)Before registration, test your feature transformation logic locally:
# In a Python cell or script
from feature_store.feature_set import user_features
# Test feature transformation logic
transformed_sample = user_features.get_sample()
print(transformed_sample.head())This validates:
- SQL transformation logic
- Feature engineering correctness
- Output schema and data types
Once validated, transform raw data into features and set up offline/online storage:
# Register feature set (data transformation + storage)
frogml features register -p feature_store/feature_set.pyWhat this does:
- Applies Spark SQL transformations to raw data
- Creates Offline Store (historical features for training)
- Creates Online Store (real-time features for inference)
- Sets up daily scheduling at midnight
- Backfills historical data from 2015
Feature Set Configuration:
@batch.feature_set(name="user-credit-risk-features", key="user_id")
@batch.scheduling(cron_expression="0 0 * * *") # Daily updates
@batch.backfill(start_date=datetime(2015, 1, 1))
def user_features():
return SparkSqlTransformation("""
SELECT user_id as user,
age, job, credit_amount, duration,
housing, saving_account, checking_account,
purpose, sex, date_created
FROM credit_risk_data
""")┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Data Source │ │ Feature Set │ │ Feature Store │
│ (CSV from S3) │───▶│ (Transform) │───▶│ Serving Runtime│
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
▼
┌──────────────────┬──────────────────┐
│ Offline Store │ Online Store │
│ (Historical) │ (Real-time) │
└──────────────────┴──────────────────┘
Configuration Layer:
- Entity:
user- unique identifier for feature vectors - Data Source: Connector definition to raw data (S3 CSV)
- Feature Set: Transformation logic + scheduling configuration
- Scheduling: Automatic feature updates (daily at midnight)
- Backfill: Historical data processing (2015 to present)
Storage Layer (Actual Manifestations):
- Offline Store: Physical storage of historical features for model training
- Online Store: Physical storage of real-time features for model inference
Feature Set Issues: Navigate to JFrogML UI → AI/ML → Feature Sets → user-credit-risk-features → Executions → Logs
Local Validation Issues: Re-run the validation steps within each registration phase to identify data source connectivity or transformation problems.
Proceed to Model Integration: 🚀 Model Training & Deployment Guide
Your Feature Store is now ready to serve features to ML models. The next guide shows how to build and deploy models that consume features from both the offline store (for training) and online store (for real-time inference).