Skip to content

Ingestr assets can skip table creation on zero rows, breaking downstream SQL assets run (observed in chess template) + need input /suggestions #1471

@aryahassibi

Description

@aryahassibi

Summary

When an ingestr source returns zero rows for the run interval, the destination table is not created. Any downstream SQL asset that references it then fails with Catalog Error: Table with name ... does not exist.

I observed this when inspecting the chess tutorial, in the chess template. There chess_playground.games is missing when the games source for both players return zero rows, so chess_playground.player_summary fails. This likely applies to other tutorials/examples that chain ingestr -> SQL assets too (also because we have a lot of chess.com examples)

In the chess example, this pipeline run fails when all players have zero games for the interval. If at least one player has games, the table is created and the downstream succeeds.

Default interval is "yesterday".if no games occurred in that window for all players, a fail occurs.

Following up on a previous issue: even if we pick very famous players, the example pipelines still CAN technically fail if they did not play within the default interval (yesterday).

Suggestions

  • Modify the asset so that it creates empty tables for profiles/games before downstream SQL runs.
  • Maybe, have ingestr create the destination table schema even when zero rows are extracted. (possibly something that we don't want)
  • Alternatively, add an option to "materialize empty table" for sources that can return zero rows
  • Or in tutorials/examples, consider a short note about the default interval and the empty-table behavior.
  • Specify an interval when running bruin.

Question

  • What do you suggest?
  • Now should I find all such problems in the code base and modify them based on the decision we make here?

Reproduce

bruin init chess

In .bruin.yml, set:

connections:
  chess:
    - name: chess-default
      players:
        - awryaw
        - albertojgomez

Then Run:

cd ./bruin
bruin run ./chess/pipeline.yml

Expected

chess_playground.games should exist even if it is empty, or the downstream SQL asset should still run and return zero stats.

Actual

chess_playground.player_summary fails because chess_playground.games does not exist. if none of the players have played during the interval (aka since yesterday).

Error:

Internal: Catalog Error: Table with name games does not exist!
LINE 17:     COUNT(*) AS total_games,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions