|
2 | 2 | title: Lineage Diff |
3 | 3 | --- |
4 | 4 |
|
5 | | -The Lineage Diff is the main interface to Recce and allows you to quickly see the potential area of impact from your dbt data modeling changes. |
| 5 | +# Understanding Lineage Diff |
6 | 6 |
|
7 | | -## Lineage Diff |
| 7 | +The Lineage view is Recce's main interface for visualizing and analyzing how your dbt model changes impact your data pipeline. It shows you the potential area of impact from your modifications, helping you determine which models need further investigation and validation. |
8 | 8 |
|
9 | | -It's from the Lineage Diff that you will determine which models to investigate further; and also perform the various data validation checks that will serve as proof-of-correctness of your work. |
| 9 | +## What is Data Lineage? |
| 10 | + |
| 11 | +Data lineage tracks the flow and transformation of data through your dbt project. In Recce, the lineage graph shows: |
| 12 | + |
| 13 | +- **Dependencies**: Which models depend on others |
| 14 | +- **Change Impact**: How modifications ripple through your pipeline |
| 15 | +- **Data Flow**: The path data takes from sources to final outputs |
| 16 | + |
| 17 | +## Viewing the Lineage Graph |
| 18 | + |
| 19 | +From the Lineage view, you can determine which models to investigate further and perform various data validation checks that serve as proof-of-correctness of your work. |
| 20 | + |
| 21 | +<figure markdown> |
| 22 | + {: .shadow} |
| 23 | + <figcaption>Interactive lineage graph showing modified models</figcaption> |
| 24 | +</figure> |
| 25 | + |
| 26 | +!!! tip "Getting Started" |
| 27 | + When you first open Recce, the lineage graph automatically loads showing only the models affected by your changes. This focused view helps you quickly understand the impact of your work. |
| 28 | + |
| 29 | +## Understanding Model Nodes |
| 30 | + |
| 31 | +### Visual Status Indicators |
10 | 32 |
|
11 | 33 | <figure markdown> |
12 | | - {: .shadow} |
13 | | - <figcaption>Lineage Diff</figcaption> |
| 34 | + {: .shadow} |
| 35 | + <figcaption>Example model node with status indicators</figcaption> |
14 | 36 | </figure> |
15 | 37 |
|
16 | | -### Node Summary |
| 38 | +Models in the lineage graph are **color-coded** to indicate their status: |
17 | 39 |
|
18 | | -{: .shadow} |
| 40 | +- **Green**: Added models (new to your project) |
| 41 | +- **Red**: Removed models (deleted from your project) |
| 42 | +- **Orange**: Modified models (changed code or configuration) |
| 43 | +- **Gray**: Unchanged models (shown for context) |
19 | 44 |
|
20 | | -Models are color-coded to indicate their **status**: |
| 45 | +### Change Detection Icons |
21 | 46 |
|
22 | | -- `Added` models are green. |
23 | | -- `Removed` models are red. |
24 | | -- `Modified` models are orange. |
| 47 | +Each model node displays two icons in the bottom-right corner that indicate detected changes: |
25 | 48 |
|
26 | | -The two icons at the bottom right of each node indicate if a `row count` or `schema` change has been detected. Grayed out icons indicate no change. |
| 49 | +- **Row Count Icon** : Shows when row count differences are detected |
| 50 | +- **Schema Icon** : Shows when column or data type changes are detected |
| 51 | + |
| 52 | +Grayed-out icons indicate no changes were detected in that category. |
27 | 53 |
|
28 | 54 | <figure markdown> |
29 | | - {: .shadow} |
| 55 | + {: .shadow} |
30 | 56 | <figcaption>Model with Schema Change detected</figcaption> |
31 | 57 | </figure> |
32 | 58 |
|
33 | | -**Note**: A row count changed icon is only shown if there is row count diff executed on this node. |
| 59 | +!!! note "Row Count Detection" |
| 60 | + The row count icon only appears after you've run a row count diff on that specific model. This helps you track which models you've already validated. |
34 | 61 |
|
35 | 62 | <figure markdown> |
36 | | - {: .shadow} |
| 63 | + {: .shadow} |
37 | 64 | <figcaption>Open the node details panel</figcaption> |
38 | 65 | </figure> |
39 | 66 |
|
40 | | -Click a model to open the [node details](#node-detail) panel and perform other data validation checks. |
| 67 | +## Investigating Model Changes |
| 68 | + |
| 69 | +### Opening the Node Details Panel |
| 70 | + |
| 71 | +Click on any model in the lineage graph to open the node details panel. This is your starting point for deeper analysis. |
41 | 72 |
|
42 | 73 |
|
43 | 74 | ## Schema Diff |
44 | 75 |
|
45 | | -Schema Diff shows added, removed, and renamed columns. Click a model in the Lineage Diff to open the node details and view the Schema Diff. |
| 76 | +Schema diff helps you understand structural changes to your models. |
| 77 | + |
| 78 | +!!! warning "Requirements" |
| 79 | + Schema diff requires `catalog.json` files in both your base and current environments. Make sure to run `dbt docs generate` in both environments before starting your Recce session. |
46 | 80 |
|
47 | | -!!! Note |
| 81 | +### Viewing Schema Changes |
48 | 82 |
|
49 | | - Schema Diff requires `catalog.json` in both environments. |
| 83 | +Click on a model to view its schema diff in the node details panel. |
50 | 84 |
|
51 | 85 | <figure markdown> |
52 | | - {: .shadow} |
53 | | - <figcaption>Schema Diff</figcaption> |
| 86 | + {: .shadow} |
| 87 | + <figcaption>Interactive schema diff showing column changes</figcaption> |
54 | 88 | </figure> |
55 | 89 |
|
| 90 | +### Types of Schema Changes |
| 91 | + |
| 92 | +Schema diff identifies: |
| 93 | + |
| 94 | +- **Added columns**: New fields in your model (shown in green) |
| 95 | +- **Removed columns**: Fields that no longer exist (shown in red) |
| 96 | +- **Renamed columns**: Fields that have changed names (shown with arrows) |
| 97 | +- **Data type changes**: Modifications to column types |
| 98 | + |
56 | 99 | <figure markdown> |
57 | | -  |
58 | | - <figcaption>Schema Diff showing renamed column</figcaption> |
| 100 | + {: .shadow} |
| 101 | + <figcaption>Schema diff showing renamed column</figcaption> |
59 | 102 | </figure> |
60 | 103 |
|
61 | 104 |
|
62 | 105 | ## Code Diff |
63 | 106 |
|
64 | | -Examine the specific code changes to understand the nature of the modifications. |
| 107 | +Understanding the code changes helps you analyze the root cause of data differences. |
| 108 | + |
| 109 | +From any model's node details panel, you can view the exact code changes that were made. This helps you understand: |
65 | 110 |
|
66 | | -Learn more [here](code-diff.md) |
| 111 | +- What SQL logic was modified |
| 112 | +- How transformations changed |
| 113 | +- Why data differences might be occurring |
| 114 | + |
| 115 | +Learn more about viewing and analyzing code changes in the [Code Diff guide](code-diff.md). |
67 | 116 |
|
68 | 117 |
|
69 | 118 | ## Node Details |
70 | 119 |
|
71 | | -The node details panel shows information about a node, such as node type, schema and row count changes, and allows you to perform diffs on the node using the options accessed via the `Explore Change` button. |
| 120 | +### Node Details Overview |
| 121 | + |
| 122 | +The node details panel provides comprehensive information about the selected model: |
72 | 123 |
|
73 | 124 | <figure markdown> |
74 | 125 | {: .shadow} |
75 | | - <figcaption>Explore the model</figcaption> |
| 126 | + <figcaption>Node details panel with exploration options</figcaption> |
76 | 127 | </figure> |
77 | 128 |
|
78 | | -You can click "Query" to jump to Query of this model. |
79 | | -There are few pre-defied diff that saved your time on writing SQL snippets. |
| 129 | +From this panel, you can: |
| 130 | + |
| 131 | +- **View model information**: Node type, materialization, and basic metadata |
| 132 | +- **Examine changes**: See what specifically changed in the model |
| 133 | +- **Run validations**: Execute pre-built data diffs and custom queries |
| 134 | +- **Add to checklist**: Document important findings for review |
| 135 | + |
| 136 | +### Available Data Validation Checks |
| 137 | + |
| 138 | +Click the "Explore Change" button to access pre-built validation checks that save time on writing SQL: |
| 139 | + |
| 140 | +1. **[Row Count Diff](../5-data-diffing/row-count-diff.md)**: Compare the number of rows between environments |
| 141 | +2. **[Profile Diff](../5-data-diffing/profile-diff.md)**: Analyze column-level statistics and distributions |
| 142 | +3. **[Value Diff](../5-data-diffing/value-diff.md)**: Identify specific value changes between datasets |
| 143 | +4. **[Top-K Diff](../5-data-diffing/topK-diff.md)**: Compare the most common values in your data |
| 144 | +5. **[Histogram Diff](../5-data-diffing/histogram-diff.md)**: Visualize data distribution changes |
| 145 | + |
| 146 | +### Custom Query Analysis |
| 147 | + |
| 148 | +Click "Query" to open the query interface where you can: |
| 149 | + |
| 150 | +- Write custom SQL to investigate changes |
| 151 | +- Run ad-hoc comparisons between environments |
| 152 | +- Validate specific business logic or data quality rules |
| 153 | + |
| 154 | +## Building Your Validation Checklist |
| 155 | + |
| 156 | +As you investigate changes, you can add important findings to your checklist for documentation and collaboration purposes. |
| 157 | + |
| 158 | +!!! tip "Collaboration Best Practice" |
| 159 | + Use the checklist feature to document your validation process. This creates a clear record of what you've tested and verified, making it easier for teammates to review your changes. |
| 160 | + |
| 161 | +## Next Steps |
| 162 | + |
| 163 | +After reviewing the lineage changes: |
80 | 164 |
|
81 | | -1. Row Count Diff: shows the difference in row counts. [Learn more here](./5-data-diffing/row-count-diff.md) |
82 | | -2. Profile Diff |
83 | | -3. Value Diff |
84 | | -4. Top-K Diff |
85 | | -5. Histogram Diff |
| 165 | +1. **Validate**: Run data diffs on critical models to verify changes are correct |
| 166 | +2. **Document**: Add key findings to your checklist with clear descriptions |
| 167 | +3. **Collaborate**: Share your analysis with team members for review |
| 168 | +4. **Integrate**: Use Recce's workflow integration to automate validation in your CI/CD process |
86 | 169 |
|
87 | | -You can add the Lineage Diff of this model to the checklist. |
| 170 | +Ready to dive deeper into specific validation techniques? Explore the [Data Diffing](../5-data-diffing/row-count-diff.md) section to learn about different ways to validate your changes. |
88 | 171 |
|
89 | 172 |
|
90 | 173 |
|
|
0 commit comments