Skip to content

Proposal: opt-in whitespace trimming for generate_surrogate_key #1068

@Ahmed-Gomaa1

Description

@Ahmed-Gomaa1

Describe the feature

Introduce an optional, opt-in parameter to dbt_utils.generate_surrogate_key that allows trimming leading and trailing whitespace from input fields before hashing.

The default behavior remains unchanged to preserve backward compatibility. When explicitly enabled, trim() is applied to each field prior to casting and hashing.

dbt_utils.generate_surrogate_key(
  ['col1', 'col2'],
  trim=true
)

Describe alternatives you've considered

  • Manually applying trim() to each column when calling the macro
  • Creating a project-level wrapper macro

These alternatives work but require repetitive SQL and reduce consistency across models.

Additional context

  • Opt-in only; no default behavior change
  • Adapter-agnostic and warehouse-independent
  • Intended to improve surrogate key stability when source data contains inconsistent whitespace

Who will this benefit?

Teams generating surrogate keys from:

  • Source systems with inconsistent leading or trailing whitespace
  • External systems where whitespace is not semantically meaningful
  • Pipelines that require stable keys across reprocessing

This helps prevent unintended key changes caused solely by formatting differences.

Are you interested in contributing this feature?

Yes — I am happy to implement the change, add tests, and update documentation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions