diff --git a/docs/user_guides/fs/feature_group/index.md b/docs/user_guides/fs/feature_group/index.md index 9f87c47cf..4161ff878 100644 --- a/docs/user_guides/fs/feature_group/index.md +++ b/docs/user_guides/fs/feature_group/index.md @@ -9,3 +9,4 @@ This section serves to provide guides and examples for the common usage of abstr - [Statistics](statistics.md) - [Data Validation](data_validation.md) - [Feature Monitoring](feature_monitoring.md) +- [Time-To-Live (TTL)](ttl.md) diff --git a/docs/user_guides/fs/feature_group/ttl.md b/docs/user_guides/fs/feature_group/ttl.md new file mode 100644 index 000000000..f673605e3 --- /dev/null +++ b/docs/user_guides/fs/feature_group/ttl.md @@ -0,0 +1,143 @@ +## Feature Group TTL Usage Guide + +Time To Live (TTL) is a feature that automatically expires data in feature groups after a specified time period. +This guide explains when and how to use TTL in your feature groups. + +### Use Case: When to Use TTL + +TTL is particularly useful for feature groups that contain time-sensitive data that becomes stale or irrelevant after a certain period. +Common use cases include: + +- **Regulatory compliance**: Data that must be automatically purged after a retention period for privacy or compliance reasons (e.g., GDPR, HIPAA) +- **Cost optimization**: Reducing storage costs by automatically removing outdated data that is no longer needed for model inference +- **Data freshness**: Ensuring that only recent, relevant data is available for online serving, preventing models from using stale features + +For example, if you're building a recommendation system, you might want user interaction features (like "items viewed in the last hour") to automatically expire after 1 hour, ensuring your model only uses current, relevant data. + +--- + +## Getting Started + +### Creating a Feature Group with TTL + +When creating a new feature group, you can enable TTL by specifying the `ttl` parameter. +The TTL value determines how long data will remain in the feature group before being automatically expired. +The TTL is calculated based on the `event_time` column. +Data rows where `event_time` is older than the TTL period will be automatically removed. + +```python +from datetime import datetime, timezone +import pandas as pd + +# Assume you already have a feature store handle +# fs = ... + +now = datetime.now(timezone.utc) +df = pd.DataFrame( + { + "id": [0, 1, 2], + "timestamp": [now, now, now], + "feature1": [10, 20, 30], + "feature2": ["a", "b", "c"], + } +) + +# Create a feature group with TTL enabled (60 seconds) +fg = fs.create_feature_group( + name="fg_ttl_example", + version=1, + primary_key=["id"], + event_time="timestamp", + online_enabled=True, + ttl=60, # TTL in seconds - data will expire after 60 seconds +) + +fg.insert( + df, + write_options={ + "start_offline_materialization": False, + "wait_for_online_ingestion": True, + }, +) + +# After 60 seconds, reading online will return empty data +fg.read(online=True) # Returns empty DataFrame after TTL expires +``` + +For detailed API reference on all possible types of TTL values, see the [FeatureStore.create_feature_group API documentation][hsfs.feature_store.FeatureStore.create_feature_group]. + +--- + +## Managing TTL on Existing Feature Groups + +### Updating the TTL Value + +You can change the TTL value for an existing feature group at any time. +This is useful when you need to adjust the retention period based on changing requirements. + +```python +# Get your existing feature group +fg = fs.get_feature_group( + name="fg_ttl_example", + version=1, +) + +# Update TTL to a new value (120 seconds = 2 minutes) +fg.enable_ttl(ttl=120) +``` + +After updating the TTL, the new retention period will apply to all future data insertions and will affect when existing data expires. + +--- + +### Disabling and Re-enabling TTL + +You can temporarily disable TTL on a feature group if you need to retain data indefinitely, and then re-enable it later. + +#### Disabling TTL + +```python +# Disable TTL - data will no longer expire automatically +fg.disable_ttl() +``` + +#### Re-enabling TTL + +When re-enabling TTL, you have two options: + +1. **Re-enable with the previous TTL value**: If you don't specify a TTL value, the feature group will use the last TTL value that was set. + + ```python + # Re-enable TTL using the previous TTL value + fg.enable_ttl() + ``` + +2. **Re-enable with a new TTL value**: Specify a new TTL value when re-enabling. + + ```python + # Re-enable TTL with a new value (90 seconds) + fg.enable_ttl(ttl=90) + ``` + +**Important**: If TTL was never set on the feature group before, you must provide a TTL value when enabling it. +Otherwise, TTL cannot be enabled. + +--- + +### Enabling TTL on an Existing Feature Group + +If you created a feature group without TTL initially, you can enable it later: + +```python +# Get an existing feature group that was created without TTL +fg = fs.get_feature_group( + name="fg_existing_no_ttl", + version=1, +) + +# Enable TTL for the first time (60 seconds) +fg.enable_ttl(ttl=60) +``` + +Once enabled, TTL will apply to all data in the feature group based on the `event_time` column. +For detailed API reference on all possible types of TTL values and additional options, see the [FeatureGroup.enable_ttl API documentation][hsfs.feature_group.FeatureGroup.enable_ttl]. diff --git a/mkdocs.yml b/mkdocs.yml index 98588e794..474307ba1 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -90,6 +90,7 @@ nav: - Notification: user_guides/fs/feature_group/notification.md - On-Demand Transformations: user_guides/fs/feature_group/on_demand_transformations.md - Online Ingestion Observability: user_guides/fs/feature_group/online_ingestion_observability.md + - Time-To-Live (TTL): user_guides/fs/feature_group/ttl.md - Feature View: - user_guides/fs/feature_view/index.md - Overview: user_guides/fs/feature_view/overview.md