Skip to content

Commit d60e414

Browse files
docs: Update documentation for transform_on_write functionality (feast-dev#5286)
Update documentation for transform_on_write functionality Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
1 parent 2abd30f commit d60e414

File tree

3 files changed

+39
-1
lines changed

3 files changed

+39
-1
lines changed

docs/getting-started/architecture/write-patterns.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ There are two ways the client can write *feature values* to the online store:
4242
Precomputed transformations can happen outside of Feast (e.g., via some batch job or streaming application) or inside of the Feast feature server when writing to the online store via the `push` or `write-to-online-store` api.
4343

4444
### 2. Computing Transformations On Demand
45-
On Demand transformations can only happen inside of Feast at either (1) the time of the client's request or (2) when the data producer writes to the online store.
45+
On Demand transformations can only happen inside of Feast at either (1) the time of the client's request or (2) when the data producer writes to the online store. With the `transform_on_write` parameter, you can control whether transformations are applied during write operations, allowing you to skip transformations for pre-processed data while still enabling transformations during API calls.
4646

4747
### 3. Hybrid (Precomputed + On Demand)
4848
The hybrid approach allows for precomputed transformations to happen inside or outside of Feast and have the On Demand transformations happen at client request time. This is particularly convenient for "Time Since Last" types of features (e.g., time since purchase).

docs/getting-started/concepts/data-ingestion.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@ Ingesting from batch sources is only necessary to power real-time models. This i
2424

2525
A key command to use in Feast is the `materialize_incremental` command, which fetches the _latest_ values for all entities in the batch source and ingests these values into the online store.
2626

27+
When working with On Demand Feature Views with `write_to_online_store=True`, you can also control whether transformations are applied during ingestion by using the `transform_on_write` parameter. Setting `transform_on_write=False` allows you to materialize pre-transformed features without reapplying transformations, which is particularly useful for large batch datasets that have already been processed.
28+
2729
Materialization can be called programmatically or through the CLI:
2830

2931
<details>

docs/reference/beta-on-demand-feature-view.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -236,6 +236,42 @@ online_response = store.get_online_features(
236236
).to_dict()
237237
```
238238

239+
### Materializing Pre-transformed Data
240+
241+
In some scenarios, you may have already transformed your data in batch (e.g., using Spark or another batch processing framework) and want to directly materialize the pre-transformed features without applying transformations during ingestion. Feast supports this through the `transform_on_write` parameter.
242+
243+
When using `write_to_online_store=True` with On Demand Feature Views, you can set `transform_on_write=False` to skip transformations during the write operation. This is particularly useful for optimizing performance when working with large pre-transformed datasets.
244+
245+
```python
246+
from feast import FeatureStore
247+
import pandas as pd
248+
249+
store = FeatureStore(repo_path=".")
250+
251+
# Pre-transformed data (transformations already applied)
252+
pre_transformed_data = pd.DataFrame({
253+
"driver_id": [1001],
254+
"event_timestamp": [pd.Timestamp.now()],
255+
"conv_rate": [0.5],
256+
# Pre-calculated values for the transformed features
257+
"conv_rate_adjusted": [0.55], # Already contains the adjusted value
258+
})
259+
260+
# Write to online store, skipping transformations
261+
store.write_to_online_store(
262+
feature_view_name="transformed_conv_rate",
263+
df=pre_transformed_data,
264+
transform_on_write=False # Skip transformation during write
265+
)
266+
```
267+
268+
This approach allows for a hybrid workflow where you can:
269+
1. Transform data in batch using powerful distributed processing tools
270+
2. Materialize the pre-transformed data without reapplying transformations
271+
3. Still use the Feature Server to execute transformations during API calls when needed
272+
273+
Even when features are materialized with transformations skipped (`transform_on_write=False`), the feature server can still apply transformations during API calls for any missing values or for features that require real-time computation.
274+
239275
## CLI Commands
240276
There are new CLI commands to manage on demand feature views:
241277

0 commit comments

Comments
 (0)