Skip to content

Commit 1af794d

Browse files
committed
Freshness update for feature-set-specification-transformation-concepts.md . . .
1 parent 3b0d54f commit 1af794d

File tree

1 file changed

+9
-9
lines changed

1 file changed

+9
-9
lines changed

articles/machine-learning/feature-set-specification-transformation-concepts.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,15 @@ ms.topic: how-to
88
ms.author: franksolomon
99
author: fbsolo-ms1
1010
ms.reviewer: yogipandey
11-
ms.date: 12/06/2023
11+
ms.date: 01/23/2025
1212
ms.custom: template-concept
1313
---
1414

1515
# Feature transformation and best practices
1616

17-
This article describes feature set specifications, the different kinds of transformations that can be used with it, and related best practices.
17+
This article describes feature set specifications, the different kinds of transformations that can be used with them, and related best practices.
1818

19-
A feature set is a collection of features generated by source data transformations. A feature set specification is a self-contained definition for feature set development and local testing. After its development and local testing, you can register that feature set as a feature set asset with the feature store. You then have versioning and materialization available as managed capabilities.
19+
A feature set is a collection of features generated by source data transformations. A feature set specification is a self-contained definition for feature set development and local testing. After development and local testing of a feature set, you can register that feature set as a feature set asset with the feature store. You then have versioning and materialization available as managed capabilities.
2020

2121
## Define a feature set
2222

@@ -85,7 +85,7 @@ The calculation happens in these steps:
8585
- Apply the feature transformer, defined by `feature_transformation.transformation_code`, on the data, and get the calculated features
8686
- Filter the feature values to return only those feature records within the feature window `[feature_window_start_ts, feature_window_end_ts)`
8787

88-
In this code sample, the feature store API computes the features:
88+
In this code sample, the feature store API calculates the features:
8989

9090
```python
9191
# define the source data time window according to feature window
@@ -137,7 +137,7 @@ class UserTotalSpendProfileTransformer(Transformer):
137137
.withColumn("is_low_spend_user", col("total_spend") < 20.0)
138138
```
139139

140-
This feature set has three features, with data types as shown:
140+
The feature set has three features, with data types as shown:
141141

142142
- `total_spend`: double
143143
- `is_high_spend_user`: bool
@@ -153,9 +153,9 @@ This shows the calculated feature values:
153153

154154
### Sliding window aggregation
155155

156-
Sliding window aggregation can help handle feature values that present statistics (for example, sum, average, etc.) that accumulate over time. The SparkSQL `Window` function defines a sliding window around each row in the data, is useful in these cases.
156+
Sliding window aggregation can help handle feature values that present statistics (for example, sum, average, etc.) that accumulate over time. The SparkSQL `Window` function defines a sliding window around each row in the data, which is useful in these cases.
157157

158-
For each row, the `Window` object can look into both future and past. In the context of machine learning features, you should define the `Window` object to look only the past, for each row. Visit the [Best Practice](#prevent-data-leakage-in-feature-transformation) section for more details.
158+
For each row, the `Window` object can look into both the future and the past. In the context of machine learning features, you should define the `Window` object to look only in the past, for each row. Visit the [Best Practice](#prevent-data-leakage-in-feature-transformation) section for more information.
159159

160160
Start with this source data:
161161

@@ -329,7 +329,7 @@ Data leakage in the feature transformation definition can lead to these problems
329329

330330
### Set proper `source_lookback`
331331

332-
For time-series (sliding/tumbling/stagger window aggregation) data aggregations, properly set the `source_lookback` property. This diagram shows the relationship between the source data window and the feature window in the feature (set) calculation:
332+
For time-series (sliding/tumbling/stagger window aggregation) data aggregations, set the `source_lookback` property correctly. This diagram shows the relationship between the source data window and the feature window in the feature (set) calculation:
333333

334334
:::image type="content" source="./media/feature-set-specification-transformation-concepts/illustration-source-lookback.png" lightbox="./media/feature-set-specification-transformation-concepts/illustration-source-lookback.png" alt-text="Illustration showing the concept of source_lookback.":::
335335

@@ -338,7 +338,7 @@ Define `source_lookback` as a time delta value, which presents the range of sour
338338
| Transformation type | `source_lookback` |
339339
|---|---|
340340
| Row-level transformation | 0 (default) |
341-
| Sliding window | size of the largest window range in the transformer.<br> e.g.<br> `source_lookback` = 3 days when the feature set defines 3 day rolling features <br> `source_lookback` = 7 days when the feature set defines both 3 day and 7 day rolling features |
341+
| Sliding window | size of the largest window range in the transformer.<br> e.g.<br> `source_lookback` = 3 days when the feature set defines three day rolling features <br> `source_lookback` = 7 days when the feature set defines both three day and seven day rolling features |
342342
| Tumbling/stagger window | value of `windowDuration` in `window` definition. e.g. source_lookback = 1day when using `window("timestamp", windowDuration="1 day",slideDuration="6 hours)` |
343343

344344
Incorrect `source_lookback` settings can lead to incorrect calculated/materialized feature values.

0 commit comments

Comments
 (0)