Add masked table column relational unprotect rewrite #417

knassre-bodo · 2025-08-26T18:49:15Z

Adding crucial rewrite step for masked table columns: During relational conversion, if a column is a masked table column, place a PROJECT on top of the SCAN node fetching data from the table. This PROJECT node will invoke an UNMASK operator (containing information from the metadata for the masked table column) which will transform the masked columns into their unmasked forms.

Updates existing tests to account for this transformation, ensuring that the UNMASK calls are injected into the underlying SQL (and pulled up as late as possible by projection pullup). All the tests in test_masked_sqlite.py will now run with the correct e2e answers, as the decryption of the underlying data is now undone by the generated query.

… implemented [RUN CI]

…ewrite

knassre-bodo · 2025-09-08T19:36:56Z

tests/gen_data/init_cryptbank.sql

-SELECT  *
-    -- 42 - column1, -- ARITHMETIC SHIFT: 42
-    -- UPPER(column2), -- UPPERCASE
-    -- UPPER(column3), -- UPPERCASE
-    -- REPLACE(REPLACE(REPLACE(column4, '0', '*'), '9', '0'), '*', '9'), -- DIGIT SWITCH: 0 <-> 9
-    -- SUBSTRING(column5, 2) || SUBSTRING(column5, 1, 1), -- FIRST CHAR TRANSPOSE
-    -- SUBSTRING(column6, 2) || SUBSTRING(column6, 1, 1),  -- FIRST CHAR TRANSPOSE
-    -- DATE(column7, '-472 days') -- DAY SHIFT: 472
+SELECT 
+    42 - column1, -- ARITHMETIC SHIFT: 42
+    UPPER(column2), -- UPPERCASE
+    UPPER(column3), -- UPPERCASE
+    REPLACE(REPLACE(REPLACE(column4, '0', '*'), '9', '0'), '*', '9'), -- DIGIT SWITCH: 0 <-> 9
+    SUBSTRING(column5, 2) || SUBSTRING(column5, 1, 1), -- FIRST CHAR TRANSPOSE
+    SUBSTRING(column6, 2) || SUBSTRING(column6, 1, 1),  -- FIRST CHAR TRANSPOSE
+    DATE(column7, '-472 days') -- DAY SHIFT: 472


Forgot to restore encryption in the original PR, but its fine bc the E2E tests were being skipped until now

knassre-bodo · 2025-09-09T17:47:25Z

pydough/sqlglot/transform_bindings/base_transform_bindings.py

+        if isinstance(
+            operator,
+            (
+                pydop.MaskedExpressionFunctionOperator,
+                pydop.SqlMacroExpressionFunctionOperator,
+            ),
+        ):


Just extending the logic we already use for UDFs with macro text, but now with the new MaskedExpressionFunctionOperator operator, which contains a format string. This format string is either the unprotect or protect format string, depending on whether is_unprotect is True/False in the operator.

knassre-bodo · 2025-09-09T17:49:26Z

pydough/pydough_operators/expression_operators/masked_expression_function_operator.py

+    def to_string(self, arg_strings: list[str]) -> str:
+        name: str = "UNMASK" if self.is_unprotect else "MASK"
+        arg_strings = [f"[{s}]" for s in arg_strings]
+        return f"{name}::({self.format_string.format(*arg_strings)})"


This is mostly for relational plans, so we can clearly delineate the mask/unmask call, the logic within, and where the arguments get injected. For example, if I am unmasking the expression foo - 7, and the logic to unmask x is (2 * x) - 1, then this gets stringified as UNMASK::((2 * {foo - 7}) - 1)

knassre-bodo · 2025-09-09T17:49:48Z

pydough/pydough_operators/expression_operators/masked_expression_function_operator.py

+        """
+        The format string to use for this operator to either mask or unmask the
+        operand.
+        """
+        return (
+            self.masking_metadata.unprotect_protocol
+            if self.is_unprotect
+            else self.masking_metadata.protect_protocol
+        )


This switch logic makes the SQL conversion step seemless

knassre-bodo · 2025-09-09T17:50:35Z

pydough/pydough_operators/expression_operators/masked_expression_function_operator.py

+    def __init__(
+        self,
+        masking_metadata: MaskedTableColumnMetadata,
+        is_unprotect: bool,


We store the object with the metadata & this boolean so all we need to do for #418 to create a MASK call is create a copy of the operator with is_unprotect toggled to False.

why named is_unprotect, for consistency is_unmasked makes more sense since class is called Masked...?

Good point, but will rename to is_unmask since it is describing whether the function masks or unmasks,.

juankx-bodo · 2025-09-22T16:32:05Z

pydough/conversion/relational_converter.py

+        # If any of the columns are masked, insert a projection on top to unmask
+        # them.
+        if any(
+            isinstance(expr, HybridColumnExpr)


I would create a helper function for code readability:

def _is_a_masked_column(expr): return isinstance(expr, HybridColumnExpr) and isinstance(expr.column.column_property, MaskedTableColumnMetadata)

Then:

if any ( _is_a_masked_column(expr) for expr in node.terms.values() ):

juankx-bodo · 2025-09-22T16:33:15Z

pydough/conversion/relational_converter.py

+        ):
+            unmask_columns: dict[str, RelationalExpression] = {}
+            for name, hybrid_expr in node.terms.items():
+                if isinstance(hybrid_expr, HybridColumnExpr) and isinstance(


I would use the helper function here: if _is_a_masked_column(hybrid_expr):

hadia206

Overall looks good to me.
I just have one comment on renaming some variables.

hadia206 · 2025-09-23T18:53:41Z

pydough/pydough_operators/expression_operators/masked_expression_function_operator.py

+    def __init__(
+        self,
+        masking_metadata: MaskedTableColumnMetadata,
+        is_unprotect: bool,


why named is_unprotect, for consistency is_unmasked makes more sense since class is called Masked...?

hadia206 · 2025-09-23T18:56:27Z

pydough/pydough_operators/expression_operators/masked_expression_function_operator.py

+        # Create a dummy verifier that requires exactly one argument, since all
+        # masking/unmasking operations are unary.
+        verifier: TypeVerifier = RequireNumArgs(1)


Is that a guarantee for all different use cases?

Yes since, from a PyDough perspective, the operator is always invoked the form UNMASK(arg) or MASK(arg). Internally, when the macro gets expanded, it may contain multiple arguments, but it is only parameterized on one argument (the thing getting masked/unmasked).

hadia206 · 2025-09-23T18:57:34Z

pydough/pydough_operators/expression_operators/masked_expression_function_operator.py

+            self.masking_metadata.unprotect_protocol
+            if self.is_unprotect
+            else self.masking_metadata.protect_protocol


Same. For consistency, we should rename those to be unmask_protocol and mask_protocol

Those are in the metadata, and are named to be consistent with the JSON fields protect protocol and unprotect protocol. If we want to change those, that's now an API spec change to something already merged. We can, but that's a separate discussion from this PR.

I agree with both of you. That is a separate discussion from this PR but I think we should consider changing protect/unprotect references to mask/unmask.

juankx-bodo · 2025-09-25T16:17:16Z

pydough/conversion/relational_converter.py

+            unmask_columns: dict[str, RelationalExpression] = {}
+            for name, hybrid_expr in node.terms.items():
+                if self.is_masked_column(hybrid_expr):
+                    assert isinstance(hybrid_expr, HybridColumnExpr)


Are these assert required? We are here because is_masked_column(hybrid_expr) == true. Am I missing something?

Yes, they are required for mypy

Ohh, you are right.

Do something like def is_masked_column(self, expr: HybridExpr) -> TypeGuard[HybridColumnExpr]: for the return type will be OK with mypy? so we don't need extra assert code just for that?

I mean, it is OK since both asserts will be always true... I was just thinking there should be way to do that for mypy.

…/masked_relational_rewrite

review-notebook-app · 2025-09-25T17:50:38Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

knassre-bodo added 17 commits August 21, 2025 14:25

Initial setup started

83aaa52

Added all metadata except the protect/unprotect protocols for CRYPTBANK

3930b3d

Added basic tests before inclusion of encryption

4a02e2a

Added more test files

32e678e

Renamed tests

957f010

Added new tests [RUN CI]

2596a00

Re-enabling encryption of CRYPTBANK data and skipping e2e tests until…

f2e46c4

… implemented [RUN CI]

Adidng more tests [RUN CI]

86036b7

[RUN CI]

f896f14

Added initial relational setup with operator for unmasking

c1315e3

Fixing naming bug

48b892b

Added cryptbank SQL support with encryptions injected

040d725

[RUN CI]

16de5a3

Fixing JSON file [RUN CI]

5241003

Merge branch 'kian/sqlite_masked_tests' into kian/masked_relational_r…

2a27c99

…ewrite

Resolving conflicts

c88d2f0

Merge branch 'kian/sqlite_masked_tests' into kian/masked_relational_r…

442621e

…ewrite

Base automatically changed from kian/sqlite_masked_tests to main September 8, 2025 16:48

Resolving conflicts [RUN CI]

e5b8ab8

knassre-bodo commented Sep 8, 2025

View reviewed changes

knassre-bodo and others added 3 commits September 8, 2025 16:00

Merge branch 'main' into kian/masked_relational_rewrite

ffbe3fe

add rest

d352175

sf_masked_examples.json

aa2ee68

knassre-bodo mentioned this pull request Sep 9, 2025

Add masked table column literal comperison masking rewrite #418

Open

knassre-bodo commented Sep 9, 2025

View reviewed changes

Revisions [RUN CI]

f09d0e7

knassre-bodo requested a review from a team September 9, 2025 17:55

hadia206 added 3 commits September 19, 2025 14:19

add sql and relational files and tests

bf2b075

use other version in some metadata and skip tests

a883759

add import deleted by ruff

5d273c3

juankx-bodo reviewed Sep 22, 2025

View reviewed changes

hadia206 added 2 commits September 22, 2025 14:41

merge

2d69928

Github action

bc09e3f

hadia206 approved these changes Sep 23, 2025

View reviewed changes

juankx-bodo approved these changes Sep 24, 2025

View reviewed changes

knassre-bodo and others added 4 commits September 24, 2025 13:47

Merge branch 'main' into kian/masked_relational_rewrite

ab08ce4

[run CI] address comments (remove test and add type hints)

a36fb2b

Revisions

df477e7

[RUN CI]

cccbe19

juankx-bodo reviewed Sep 25, 2025

View reviewed changes

knassre-bodo added 2 commits September 25, 2025 13:07

Merge remote-tracking branch 'origin/Hadia/sf_masked_tests' into kian…

600492a

…/masked_relational_rewrite

Resolving conflicts, adding raw vs rewrite

82e9691

knassre-bodo changed the base branch from main to Hadia/sf_masked_tests September 25, 2025 17:15

knassre-bodo added 3 commits September 25, 2025 13:22

Adding raw vs rewrite

2a24514

Fixing SQL handling and fixtures

aea501f

Resolving conflicts

a733022

knassre-bodo added 5 commits September 25, 2025 14:06

WIP

a5c3b9c

Adding more tests

bdde458

Adding more tests

4a58775

Resolving test updates

66f2193

Adding more tests

630c7cc

Base automatically changed from Hadia/sf_masked_tests to main September 29, 2025 20:59

knassre-bodo added 2 commits September 29, 2025 17:02

Resolving conflicts [RUN ALL]

f4c318f

[RUN CI] [RUN SF_MASKED]

478b2f6

knassre-bodo merged commit 1be9e87 into main Sep 30, 2025
12 checks passed

knassre-bodo deleted the kian/masked_relational_rewrite branch September 30, 2025 01:24

Add masked table column relational unprotect rewrite #417

Add masked table column relational unprotect rewrite #417

Uh oh!

Conversation

knassre-bodo commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hadia206 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

review-notebook-app bot commented Sep 25, 2025

Uh oh!

Uh oh!

Uh oh!

knassre-bodo commented Aug 26, 2025 •

edited

Loading