You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/SCD2-sparkcompute.md
+5-1Lines changed: 5 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,19 +9,23 @@ Use Case
9
9
This plugin is used to integrate the incoming data to the Slowly Changing Dimension Type 2 (SCD2) target tables, and to track the history of the record.
10
10
11
11
It supports the following patterns:
12
+
12
13
**New record integration**
13
14
If a new record is received from the input flow (with new natural key), it is added as active, with end date equal to a dummy value (9999-12-31).
15
+
14
16
**Changed record integration**
15
17
If a record is received with at least an update, the previous version is closed, with end date = start date of the new record minus
16
18
one second) and the new record is added as active, with end date equal to the dummy value (9999-12-31).
19
+
17
20
**False delta record cut off**
18
21
This capability checks if at least one of the fields of the input record has changed from the previous version already present in
19
22
the SCD2 table. If it does it is integrated to the table, otherwise it is discarded. This feature allows this plug-in to be agnostic about the input data loading
20
23
(delta or full), since it manages only the changed records and avoids integrating records without variations.
24
+
21
25
**Late arriving data**
22
26
The plugin checks the date of completion of each incoming record in order to add them in the right time interval (start/end date) and
23
27
to handle records with the date of completion older than the current active version. For example:
0 commit comments