Skip to content

Conversation

@bruno-szdl
Copy link
Contributor

@bruno-szdl bruno-szdl commented Mar 5, 2025

resolves #250

Problem

When creating a unit test, it is a bit painful to go through all the input and expected models, get the model name and column names, and write all the YML code. Especially if the model has a lot of inputs.

Solution

A macro to generate a unit test YML skeleton. A macro example is in the issue description.

@dave-connors-3 provided some feedback

  • We are thinking you could add a helper function that returns the entire resource dictionary, which you could then key into for info like resource.resource_type and resource.package_name etc. This might make your life a lot easier!
    • I added Dave's helper function and I'm using it to retrieve some configs, instead of splitting the unique id. It is undoubtedly better this way. I didn't feel the need to modify his helper function at all.
  • We should test quoting + case sensitivity rules for all uses of column names.
    • Not sure how to do it. Could you help me here?
  • I think we could do away with the boilerplate description! while it's best practice it's not required and I fear how many boilerplate descriptions may accidentally make their way into production code!
    • Makes sense! I removed it.
  • We'd love to see a newline after each column for maximal readability + ease of adding column values!
    • Can we discuss this point? In my opinion, when you need to add several input rows, I'd rather keep one line per row, so you can see it as a table, with the values of each column straight below each other. If there's a new line for each column, it is harder to visualize it as a table.
  • do we need the {{ print('\n') }} ? should we just add those directly to the string / avoid accidentally erasing the newlines with whitespace control characters?
    • Agreed it is a bad practice. Adjusted.

--

Also, I added a relation exists check, so we only try to retrieve the current model's columns if the model exists in the DW.

--

I'd love to hear your thoughts about the modifications and how we can improve the macro.

Usage examples

dbt run-operation generate_unit_test_template --args 'model_name: order_items'
dbt run-operation generate_unit_test_template --args '{model_name: model_incremental}'
dbt run-operation generate_unit_test_template --args '{ model_name: order_items, inline_columns: true}'
dbt run-operation generate_unit_test_template --args '{ model_name: order_items, inline_columns: false}'

Checklist

  • This code is associated with an issue which has been triaged and accepted for development.
  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the README.md (if applicable)

@bruno-szdl bruno-szdl requested a review from a team as a code owner March 5, 2025 17:58
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Mar 5, 2025
@dbeatty10
Copy link
Contributor

We'd love to see a newline after each column for maximal readability + ease of adding column values!

  • Can we discuss this point? In my opinion, when you need to add several input rows, I'd rather keep one line per row, so you can see it as a table, with the values of each column straight below each other. If there's a new line for each column, it is harder to visualize it as a table.

Thanks for opening this PR @bruno-szdl! We can totally discuss this piece.

It seems like we have 4 options here:

  1. All column names on one line.
  2. One line per column name.
  3. Extra parameter that allows users to choose between 1 and 2 (defaulting to 1).
  4. Extra parameter that allows users to choose between 1 and 2 (defaulting to 2).

Option 2) is easier for me when adding values (because my eyes can see line breaks easier than scanning for : horizontally.

But option 1) is easier to read as a table once the values have been input (assuming the text for the values is all similar width).

Here's an example of manually adding strategic whitespace to make at most readable as a table:
image

So it seems like it depends what we're trying to optimize for here.

What do you think the pros/cons are for each of those (4) options?

@bruno-szdl
Copy link
Contributor Author

Hey @dbeatty10, now I get it. The 'table view' in the docs is because it was manually adjusted to align the column names.

Personally, I still prefer the inline way, but I like the approach of giving the dev the option via parameter. And I don't mind making the multiline option the default one. So I like the option 4)

I have modified the code to allow those two options. The new parameter name is inline_columns, but I'm not good at choosing names, so feel free to alter it xD

Now I'm wondering if we should add a parameter to give the option of csv and sql formats.

Copy link
Contributor

@dbeatty10 dbeatty10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking awesome @bruno-szdl 🤩 (and sorry for the delay in getting back to you!)

The get_resource_from_unique_id macro was defined twice, so I removed one of those within 809991d in order to pass CI.

A couple things that we'll want to do before merging:

  • Add test cases similar to those found in the integration_tests/tests directory (see here for one example)
  • Add documentation + example(s) within README.md on how to use this maco (see here for one example).

@dbeatty10 dbeatty10 self-requested a review April 3, 2025 00:27
@dbeatty10 dbeatty10 dismissed their stale review April 3, 2025 00:28

I addressed these comments

Copy link
Contributor

@dbeatty10 dbeatty10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bruno-szdl I took care of the following:

  • Add test cases similar to those found in the integration_tests/tests directory (see here for one example)
  • Add documentation + example(s) within README.md on how to use this maco (see here for one example).

So ready to rock!

(It's not passing Circle CI for Snowflake only due to a stale password. So merging without that one.)

@dbeatty10 dbeatty10 merged commit f6666ca into dbt-labs:main Apr 3, 2025
8 of 9 checks passed
@bruno-szdl
Copy link
Contributor Author

@dbeatty10 Hey Doug, thanks a lot for taking care of those. I was swamped these last few days. Glad to contribute to codegen with your help!!

@dbeatty10
Copy link
Contributor

You're very welcome @bruno-szdl ! Awesome to get to collaborate with you on this ❤️

insuhpak added a commit to cdp-ucsc/cdp-ucsc-dbt-codegen that referenced this pull request Apr 23, 2025
…t-codegen (#9)

* Update Changelog, temporarily remove Case Sensitivity testing (dbt-labs#174)

* Update changelog

* Fix typo in CHANGELOG

* Attempt to fix integration test case sensitivity for redshift

* Always lower on Redshift

* Fix indentation

* Fix whitespace

* Fix whitespace pt 2

* Update case sensitive seeds into folder

* Use + for quote_columns config

* Lower schema on redshift

* Use target.type

* Do some nonsense to make this work for Redshift

* Move seeds config to properties.yml

* Bypass redshift completely

* Temporarily bypass Redshift in CI completely

* Turn Redshift CI back on

* Delete case sensitivity test

* Delete case sensitive seed

* Delete properties.yml for case sensitive seeds

* Update testing structure for dbt Labs testing support - postgres (dbt-labs#181)

* add tox

* add postgres/redshift/bq

* add more wh, update var names

* fix profiles key

* move supported adapters

* Add CI workflow

* update to allow circleci to keep working

* fix BQ var name

* remove config from profile

* move to just support postgres

* fix vars

* use premade workflow

* add newline

* use merged version

* add comments about future adapters

* use tag

* update readme

* Add lines

* reowkr profile so it can be reused

* add sample env files and fix comment

* Update .circleci/config.yml

Co-authored-by: Doug Beatty <[email protected]>

* let circleCI set the schema

* fix readme instructions

* undo non-postgres changes

---------

Co-authored-by: Doug Beatty <[email protected]>

* Restore CI test for case-sensitive identifiers when generating sources (dbt-labs#192)

* Restore case sensitivity test

* Try to make integration test adapter-agnostic

* Try to make integration test CI-agnostic and adapter-agnostic

* Use adapter-agnostic data types

* Temporarily hard-code case-insensitive seed name for Snowflake

* Temporarily hard-code case-insensitive seed name for Snowflake

* Changlogs for 0.12.0, 0.12.1, and 0.13.0-b1 (dbt-labs#196)

* add snowflake (dbt-labs#198)

* add snowflake

* fix profiles

* check what’s installed

* pass in snowflake password

* and more env to gitignore

* Fix quoted identifiers in the `generate_base_model` macro for BigQuery (dbt-labs#199)

* Use `adapter.quote` to create a case-sensitive quoted identifier for column names

* Force a failure for all adapters to help troubleshoot

* Revert "Force a failure for all adapters to help troubleshoot"

This reverts commit d707832.

* Use `adapter.quote` to create a case-sensitive quoted identifier for column names in `generate_base_model` macro

* Try removing Redshift-specific logic (dbt-labs#208)

* Use the `cimg/postgres` Docker image created by CircleCI with continu… (dbt-labs#214)

* Use the `cimg/postgres` Docker image created by CircleCI with continuous integration builds in mind

* Add the root Postgres user to the environment

* Independent workflow job for dbt-postgres (dbt-labs#215)

* Independent workflow job for dbt-postgres

* Remove activation of virtual environment

* Try without `python -m`

* Independent workflow job for dbt-redshift

* Independent workflow job for dbt-snowflake

* Independent workflow job for dbt-snowflake

* Independent workflow job for dbt-bigquery

* Independent workflow job for dbt-bigquery

* Independent workflow job for dbt-bigquery

* Independent workflow job for dbt-bigquery

* Independent workflow job for dbt-bigquery

* Setup environment variables for dbt-bigquery

* Simplify environment variables for BigQuery in CircleCI (dbt-labs#216)

* Simplify environment variables for BigQuery in CircleCI

* Fix YAML parsing error

* Fix reference to environment variable

* Fix reference to environment variable

* Stop installing prereleases from PyPI in favor of stable releases only (dbt-labs#220)

* Upgrade to Python 3.11 in CircleCI (dbt-labs#222)

* Use dynamic schema names rather than hardcoded ones (dbt-labs#224)

* Disable two CI tests

* Use a dynamic schema name based off the target schema rather than a hardcoded one

* Restore one of the CI tests

* Try updating the expected output

* Update expected model given upstream changes

* Restore the other CI test

* Update expected model given upstream changes

* add support for redshift testing (dbt-labs#204)

Co-authored-by: Doug Beatty <[email protected]>

* Add support for bigquery testing in GitHub CI via tox (dbt-labs#203)

* add support for bigquery testing

* add missing var in tox file

* Temporarily only run CI tests for BigQuery

* Prefix the schema for the data source in CI with the name of the target schema

* Store artifacts for logs and target directories for BigQuery

* Set up environment variable for BigQuery credentials (keyfile for service account JSON)

* Set the custom schema in the source definition

* Use the target schema

* Try to align actual vs. expected when the schema name is variable

* Remove extraneous storage of artifacts

* Temporarily disable two failing CI tests

* Revert "Temporarily disable two failing CI tests"

This reverts commit d70d776.

---------

Co-authored-by: Doug Beatty <[email protected]>

* Update changelog for 0.13.0 release (dbt-labs#227)

* Revert "Restore CI test for case-sensitive identifiers when generating sources (dbt-labs#192)" (dbt-labs#230)

This reverts commit 7acc8c0.

* Update changelog for 0.13.1 release (dbt-labs#232)

* Upgrade from Postgres 9 to 17 (dbt-labs#234)

* Upgrade from Postgres 9 to 17

* The postgres image in CircleCI needs a major and minor version specified

* Update ci.yml (dbt-labs#235)

* Bigquery repeated data type (dbt-labs#236)

* Handle BigQuery repeated fields data_types

* include nested repated structs

* override repeated struct data_type with array

* Add trailing newline

* update changelog

* Update CHANGELOG.md

---------

Co-authored-by: Doug Beatty <[email protected]>

* Remove "I have added an entry to CHANGELOG.md" from the PR template (dbt-labs#239)

* Contributors shouldn't edit the `CHANGELOG.md` directly anymore (dbt-labs#240)

* Remove the PR checklist items related to the type of change (dbt-labs#243)

* Align the PR description with dbt-core, dbt-adapters, etc. (dbt-labs#244)

* Align the PR description with dbt-core, dbt-adapters, etc.

* Add hyperlink to the open issues

* Align the pull request template with `dbt-utils` (dbt-labs#246)

* Update CODEOWNERS file (dbt-labs#248)

* Update CODEOWNERS file with global codeowner

* Update .github/CODEOWNERS

---------

Co-authored-by: Doug Beatty <[email protected]>

* adding `generate_unit_test_template` macro (dbt-labs#251)

* adding generate_unit_test_template_macro

* added a relation exists check

* adding arg to controle inline/multiline columns

* Removing duplicated `get_resource_from_unique_id` macro

* Add newline to end of file

* Add a simple incremental model for testing

* CI tests

* Use dispatch, etc.

* Usage documentation for `generate_unit_test_template` macro

---------

Co-authored-by: Bruno Souza de Lima <[email protected]>
Co-authored-by: Doug Beatty <[email protected]>

---------

Co-authored-by: winnie <[email protected]>
Co-authored-by: Emily Rockman <[email protected]>
Co-authored-by: Doug Beatty <[email protected]>
Co-authored-by: Doug Beatty <[email protected]>
Co-authored-by: Magnús Þór Benediktsson <[email protected]>
Co-authored-by: security-dbtlabs <[email protected]>
Co-authored-by: Bruno S. de Lima <[email protected]>
Co-authored-by: Bruno Souza de Lima <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add generate_unit_test_template macro

2 participants