Skip to content

Commit 5443acf

Browse files
Merge pull request #2496 from fbsolo-ms1/freshness-updates
Freshness update for offline-retrieval-point-in-time-join-concepts.md . . .
2 parents 1490c91 + 615dc87 commit 5443acf

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

articles/machine-learning/offline-retrieval-point-in-time-join-concepts.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,20 +8,20 @@ ms.topic: how-to
88
ms.author: franksolomon
99
author: fbsolo-ms1
1010
ms.reviewer: yogipandey
11-
ms.date: 12/06/2023
11+
ms.date: 01/24/2025
1212
ms.custom: template-concept
1313
---
1414

1515
# Offline feature retrieval using a point-in-time join
1616

1717
## Understanding the point-in-time join
1818

19-
A *point-in-time*, or temporal, join helps address data leakage. In the model training process, [Data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)), or target leakage, involves the use of information that isn't expected to be available at prediction time. This would cause the predictive scores (metrics) to overestimate the utility of the model when the model runs in a production environment. [This article](https://www.kaggle.com/code/alexisbcook/data-leakage#Target-leakage) explains data leakage.
19+
A *point-in-time*, or temporal, join helps address data leakage. In the model training process, [Data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)), or target leakage, involves the use of information that isn't expected to be available at prediction time. Target leakage would cause the predictive scores (metrics) to overestimate the utility of the model when the model runs in a production environment. [This article](https://www.kaggle.com/code/alexisbcook/data-leakage#Target-leakage) explains data leakage.
2020

2121
The next illustration explains how feature store point-in-time joins work:
2222

23-
- The observation data has two labeled events, `L0` and `L1`. The two events occurred at times `t0` and `t1` respectively.
24-
- A training sample is created from this observation data with a point-in-time join. For each observation event, the feature value from its most recent previous event time (`t0` and `t1`) is joined with the event.
23+
- The observation data has two labeled events: `L0` and `L1`. The two events occurred at times `t0` and `t1` respectively.
24+
- A training sample is created from this observation data, with a point-in-time join. For each observation event, the feature value from its most recent previous event time (`t0` and `t1`) is joined with the event.
2525

2626
:::image type="content" source="media/offline-retrieval-point-in-time-join/point-in-time-join.png" lightbox="media/offline-retrieval-point-in-time-join/point-in-time-join.png" alt-text="Illustration that shows a simple point-in-time join.":::
2727

@@ -40,7 +40,7 @@ Both parameters represent a duration, or time delta. For an observation event th
4040

4141
### The `source_delay` property
4242

43-
The `source_delay` source data property indicates the acquisition time delay at the moment that data is ready to consume. The time value at that moment is compared to the time value at the moment the data is generated. An event that happened at time `t` lands in the source data table at time `t + x`, due to the latency in the upstream data pipeline. The `x` value is the source delay.
43+
The `source_delay` source data property indicates the acquisition time delay at the moment that data is ready for consumption. The time value at that moment is compared to the time value at the moment of generation of that data. An event that happened at time `t` lands in the source data table at time `t + x`, due to the latency in the upstream data pipeline. The `x` value is the source delay.
4444

4545
Source delay can lead to [Data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)):
4646

@@ -55,7 +55,7 @@ This screenshot shows the output of the `get_offline_features` function that per
5555

5656
:::image type="content" source="media/offline-retrieval-point-in-time-join/point-in-time-join-source-delay-output.png" lightbox="media/offline-retrieval-point-in-time-join/point-in-time-join-source-delay-output.png" alt-text="Illustration that shows output of a point-in-time join with source delay.":::
5757

58-
If users don't set the `source_delay` value in the feature set specification, its default value is `0`. This means that no source delay is involved. The `source_delay` value is also considered in recurrent feature materialization. Visit [this](./feature-set-materialization-concepts.md) resource for more details about feature set materialization.
58+
If users don't set the `source_delay` value in the feature set specification, its default value is `0`. This means that no source delay is involved. The `source_delay` value is also considered in recurrent feature materialization. Visit [this](./feature-set-materialization-concepts.md) resource for more information about feature set materialization.
5959

6060
### The `temporal_join_lookback`
6161

0 commit comments

Comments
 (0)