How effectively to ingest many dbt files that are separated/run by models? #21989

nikita-sheremet-java-developer · 2025-06-26T12:38:52Z

nikita-sheremet-java-developer
Jun 26, 2025

I have a project with about 500+ dbt models (Trino + iceberg). Each model is run by Airflow DAG in the order. Then files manifest.json, catalog.json and run_results.json are uploaded into s3 prefix like that s3://my_bucket/dbt/mymodel/. So here I have ~1500 files = 500 x manifest.json + 500 x catalog.json + 500 x run_results.json.

This leads to message dbt: Processed 550000 records (2x500x500 roughly) and this is too much. I have less table in Trino than these records. I spot that manifest.json (3.2Mb) is the same for all models so when I removed it and put it only once into s3://my_bucket/dbt/ I got much less time for processing - about 10 minutes instead of 10h. But also messages Manifest file not found at: dbt/mymodel starting to apear and make me nervous that something goes wrong or I miss some data or lineage.

Could some body please clarify what is the proper way to ingest dbt data? May be put all files in one folder? But in this case files catalog.json and run_results.json will override each other. Any ideas? any secret settings? In source I spot that there can be a collection of files - but I have no idea how orginize them.

Offtop: dbt configuration does not have threads parameters - any suggestion how it can be added.

mikezue-co · 2025-08-07T12:40:12Z

mikezue-co
Aug 7, 2025

@OrionFanWeb1701 odezwij się do mnie na maila, mamy duży zleceń dla Ciebie

0 replies

moritzsanne · 2025-08-22T09:35:25Z

moritzsanne
Aug 22, 2025

Hey @nikita-sheremet-java-developer,
our setup also produces a bunch of run-results files during the day, which we batch-ingest every night. This
documentation was helpful. We suffix each run-results file with its corresponding model/test name and a timestamp, and store them in an S3 bucket.

At night, we move all files scheduled for ingestion into a new prefix, add the latest manifest.json and catalog.json, and ingest everything from that prefix.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How effectively to ingest many dbt files that are separated/run by models? #21989

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How effectively to ingest many dbt files that are separated/run by models? #21989

Uh oh!

nikita-sheremet-java-developer Jun 26, 2025

Replies: 2 comments

Uh oh!

mikezue-co Aug 7, 2025

Uh oh!

moritzsanne Aug 22, 2025

nikita-sheremet-java-developer
Jun 26, 2025

mikezue-co
Aug 7, 2025

moritzsanne
Aug 22, 2025