-
Place your dbt
manifest.jsonandcatalog.jsonfiles in theinputsdirectory. -
Customization:
- Set your dialect (only tested with
snowflakeso far) in themain_step_1_direct.pyscript. - You can specify the scope of the models you want to extract column lineage for by adding them to the
li_selected_modellist, or leave it empty to process all models. - When specifying models, you can use dbt-style selectors like
+model_name(ancestors),model_name+(descendants),+model_name+(entire lineage),tag:my_tag(tag filtering), etc. - Both models and sources are supported in selectors (e.g.,
source.schema.table). - Alternatively, you can create a JSON file with a list of models and use the
--model-list-jsonparameter when running the CLI.
- Set your dialect (only tested with
-
Run the
main_step_1_direct.pyscript to extract direct column lineage:python main_step_1_direct.py
-
This will generate direct column lineage relationships for all models in the
outputsdirectory.lineage_to_direct_parents.jsonlineage_to_direct_children.json
When specifying models using Python code, you can use dbt-style selectors just like in the CLI:
# Example model selectors
li_selected_model = [
# Include orders and all its ancestors
"+orders",
# Include all models with "finance" tag
"tag:finance",
# Include models that are both daily-tagged AND in the core package
"tag:daily,package:core",
# Include a specific source
"source.raw.customers",
# Include a source and all its downstream dependencies
"source.raw.orders+",
# Get the entire lineage (upstream and downstream) of a source
"+source.raw.payments+"
]
extractor = DbtColumnLineageExtractor(
manifest_path="./inputs/manifest.json",
catalog_path="./inputs/catalog.json",
selected_models=li_selected_model,
dialect="snowflake"
)-
With the output from the direct column lineage step, run the
main_step_2_recursive.pyscript to analyze recursive column lineage:Customization: Change the
modelandcolumnvariables inmain_step_2_recursive.pyto target different models or columns for recursive column lineage analysis. You don't need to run the direct lineage extraction again if there are no changes in the models.python main_step_2_recursive.py
-
This will generate squashed/structured ancestors and descendants for the specified model and column.