You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/offline-retrieval-point-in-time-join-concepts.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,20 +8,20 @@ ms.topic: how-to
8
8
ms.author: franksolomon
9
9
author: fbsolo-ms1
10
10
ms.reviewer: yogipandey
11
-
ms.date: 12/06/2023
11
+
ms.date: 01/24/2025
12
12
ms.custom: template-concept
13
13
---
14
14
15
15
# Offline feature retrieval using a point-in-time join
16
16
17
17
## Understanding the point-in-time join
18
18
19
-
A *point-in-time*, or temporal, join helps address data leakage. In the model training process, [Data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)), or target leakage, involves the use of information that isn't expected to be available at prediction time. This would cause the predictive scores (metrics) to overestimate the utility of the model when the model runs in a production environment. [This article](https://www.kaggle.com/code/alexisbcook/data-leakage#Target-leakage) explains data leakage.
19
+
A *point-in-time*, or temporal, join helps address data leakage. In the model training process, [Data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)), or target leakage, involves the use of information that isn't expected to be available at prediction time. Target leakage would cause the predictive scores (metrics) to overestimate the utility of the model when the model runs in a production environment. [This article](https://www.kaggle.com/code/alexisbcook/data-leakage#Target-leakage) explains data leakage.
20
20
21
21
The next illustration explains how feature store point-in-time joins work:
22
22
23
-
- The observation data has two labeled events,`L0` and `L1`. The two events occurred at times `t0` and `t1` respectively.
24
-
- A training sample is created from this observation data with a point-in-time join. For each observation event, the feature value from its most recent previous event time (`t0` and `t1`) is joined with the event.
23
+
- The observation data has two labeled events:`L0` and `L1`. The two events occurred at times `t0` and `t1` respectively.
24
+
- A training sample is created from this observation data, with a point-in-time join. For each observation event, the feature value from its most recent previous event time (`t0` and `t1`) is joined with the event.
25
25
26
26
:::image type="content" source="media/offline-retrieval-point-in-time-join/point-in-time-join.png" lightbox="media/offline-retrieval-point-in-time-join/point-in-time-join.png" alt-text="Illustration that shows a simple point-in-time join.":::
27
27
@@ -40,7 +40,7 @@ Both parameters represent a duration, or time delta. For an observation event th
40
40
41
41
### The `source_delay` property
42
42
43
-
The `source_delay` source data property indicates the acquisition time delay at the moment that data is ready to consume. The time value at that moment is compared to the time value at the moment the data is generated. An event that happened at time `t` lands in the source data table at time `t + x`, due to the latency in the upstream data pipeline. The `x` value is the source delay.
43
+
The `source_delay` source data property indicates the acquisition time delay at the moment that data is ready for consumption. The time value at that moment is compared to the time value at the moment of generation of that data. An event that happened at time `t` lands in the source data table at time `t + x`, due to the latency in the upstream data pipeline. The `x` value is the source delay.
44
44
45
45
Source delay can lead to [Data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)):
46
46
@@ -55,7 +55,7 @@ This screenshot shows the output of the `get_offline_features` function that per
55
55
56
56
:::image type="content" source="media/offline-retrieval-point-in-time-join/point-in-time-join-source-delay-output.png" lightbox="media/offline-retrieval-point-in-time-join/point-in-time-join-source-delay-output.png" alt-text="Illustration that shows output of a point-in-time join with source delay.":::
57
57
58
-
If users don't set the `source_delay` value in the feature set specification, its default value is `0`. This means that no source delay is involved. The `source_delay` value is also considered in recurrent feature materialization. Visit [this](./feature-set-materialization-concepts.md) resource for more details about feature set materialization.
58
+
If users don't set the `source_delay` value in the feature set specification, its default value is `0`. This means that no source delay is involved. The `source_delay` value is also considered in recurrent feature materialization. Visit [this](./feature-set-materialization-concepts.md) resource for more information about feature set materialization.
0 commit comments