-
Notifications
You must be signed in to change notification settings - Fork 56
Description
Summary
When an ingestr source returns zero rows for the run interval, the destination table is not created. Any downstream SQL asset that references it then fails with Catalog Error: Table with name ... does not exist.
I observed this when inspecting the chess tutorial, in the chess template. There chess_playground.games is missing when the games source for both players return zero rows, so chess_playground.player_summary fails. This likely applies to other tutorials/examples that chain ingestr -> SQL assets too (also because we have a lot of chess.com examples)
In the chess example, this pipeline run fails when all players have zero games for the interval. If at least one player has games, the table is created and the downstream succeeds.
Default interval is "yesterday".if no games occurred in that window for all players, a fail occurs.
Following up on a previous issue: even if we pick very famous players, the example pipelines still CAN technically fail if they did not play within the default interval (yesterday).
Suggestions
- Modify the asset so that it creates empty tables for
profiles/gamesbefore downstream SQL runs. - Maybe, have ingestr create the destination table schema even when zero rows are extracted. (possibly something that we don't want)
- Alternatively, add an option to "materialize empty table" for sources that can return zero rows
- Or in tutorials/examples, consider a short note about the default interval and the empty-table behavior.
- Specify an interval when running bruin.
Question
- What do you suggest?
- Now should I find all such problems in the code base and modify them based on the decision we make here?
Reproduce
bruin init chessIn .bruin.yml, set:
connections:
chess:
- name: chess-default
players:
- awryaw
- albertojgomezThen Run:
cd ./bruin
bruin run ./chess/pipeline.yml
Expected
chess_playground.games should exist even if it is empty, or the downstream SQL asset should still run and return zero stats.
Actual
chess_playground.player_summary fails because chess_playground.games does not exist. if none of the players have played during the interval (aka since yesterday).
Error:
Internal: Catalog Error: Table with name games does not exist!
LINE 17: COUNT(*) AS total_games,