Skip to content

Conversation

@plusplusjiajia
Copy link
Member

Purpose

Introduces four new Transform implementations for data masking:

  1. HashTransform: Masks string columns by hashing with configurable algorithm (default SHA-256) and optional salt
  2. PartialMaskTransform: Masks the middle part of string
  3. NullTransform: Always returns null for the input field
  4. DefaultValueTransform: Returns the default value based on the field's data type

@JingsongLi
Copy link
Contributor

What these transforms refer from?

@plusplusjiajia
Copy link
Member Author

plusplusjiajia commented Jan 9, 2026

@JingsongLi
Copy link
Contributor

@plusplusjiajia When I look at Ranger, these functions seem to be more closely aligned with the SQL standard, such as to_hex, regexp_replace, and so on.

We have two options:

  1. If we decide using standard SQL functions, which seems to require the ability to nest.
  2. If we provide mask functions, we need to adopt a name that is more closely related to Mask, such as MASK_HASH instead of Hash.

@plusplusjiajia
Copy link
Member Author

@plusplusjiajia When I look at Ranger, these functions seem to be more closely aligned with the SQL standard, such as to_hex, regexp_replace, and so on.

We have two options:

  1. If we decide using standard SQL functions, which seems to require the ability to nest.
  2. If we provide mask functions, we need to adopt a name that is more closely related to Mask, such as MASK_HASH instead of Hash.
    That's a good question, seems like we can consider it after supporting column masking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants