-
Notifications
You must be signed in to change notification settings - Fork 3
Add masked table column literal comperison masking rewrite #418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
knassre-bodo
wants to merge
68
commits into
main
Choose a base branch
from
kian/mask_literal_rewrite
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+248
−82
Open
Changes from 21 commits
Commits
Show all changes
68 commits
Select commit
Hold shift + click to select a range
83aaa52
Initial setup started
knassre-bodo 3930b3d
Added all metadata except the protect/unprotect protocols for CRYPTBANK
knassre-bodo 4a02e2a
Added basic tests before inclusion of encryption
knassre-bodo 32e678e
Added more test files
knassre-bodo 957f010
Renamed tests
knassre-bodo 2596a00
Added new tests [RUN CI]
knassre-bodo f2e46c4
Re-enabling encryption of CRYPTBANK data and skipping e2e tests until…
knassre-bodo 86036b7
Adidng more tests [RUN CI]
knassre-bodo f896f14
[RUN CI]
knassre-bodo c1315e3
Added initial relational setup with operator for unmasking
knassre-bodo 48b892b
Fixing naming bug
knassre-bodo 040d725
Added cryptbank SQL support with encryptions injected
knassre-bodo 16de5a3
[RUN CI]
knassre-bodo 5241003
Fixing JSON file [RUN CI]
knassre-bodo 2a27c99
Merge branch 'kian/sqlite_masked_tests' into kian/masked_relational_r…
knassre-bodo 9e50717
Initial implementation in progress
knassre-bodo c88d2f0
Resolving conflicts
knassre-bodo 442621e
Merge branch 'kian/sqlite_masked_tests' into kian/masked_relational_r…
knassre-bodo 1fe90af
Merge branch 'kian/masked_relational_rewrite' into kian/mask_literal_…
knassre-bodo e5b8ab8
Resolving conflicts [RUN CI]
knassre-bodo 2313b57
Resolving conflicts [RUN CI]
knassre-bodo ffbe3fe
Merge branch 'main' into kian/masked_relational_rewrite
knassre-bodo d352175
add rest
hadia206 aa2ee68
sf_masked_examples.json
hadia206 f09d0e7
Revisions [RUN CI]
knassre-bodo 76a16c2
Merge branch 'main' into kian/masked_relational_rewrite
knassre-bodo be00e58
Merge branch 'main' into kian/masked_relational_rewrite
knassre-bodo fa2d869
[RUN CI]
knassre-bodo fed46d5
Merge branch 'kian/masked_relational_rewrite' into kian/mask_literal_…
knassre-bodo 655054f
Resolving conflicts
knassre-bodo 874dcad
Revisions WIP
knassre-bodo 2189fb3
Merge branch 'main' into kian/masked_relational_rewrite
knassre-bodo 28b58ce
Merge branch 'main' into kian/mask_literal_rewrite
knassre-bodo 4536992
Merge branch 'kian/masked_relational_rewrite' into kian/mask_literal_…
knassre-bodo e2fe6b7
Adding environment variable and doubling cryptbank tests to case on it
knassre-bodo 0f6e59f
Adding environment variable
knassre-bodo 98a9c4c
[RUN CI]
knassre-bodo 46b1c36
Resolving conflicts [RUN CI]
knassre-bodo bf2b075
add sql and relational files and tests
hadia206 a883759
use other version in some metadata and skip tests
hadia206 5d273c3
add import deleted by ruff
hadia206 2d69928
merge
hadia206 bc09e3f
Github action
hadia206 ab08ce4
Merge branch 'main' into kian/masked_relational_rewrite
knassre-bodo a36fb2b
[run CI] address comments (remove test and add type hints)
hadia206 df477e7
Revisions
knassre-bodo cccbe19
[RUN CI]
knassre-bodo 61194bb
Merge branch 'kian/masked_relational_rewrite' into kian/mask_literal_…
knassre-bodo 1dfe201
revisions
knassre-bodo 600492a
Merge remote-tracking branch 'origin/Hadia/sf_masked_tests' into kian…
knassre-bodo 82e9691
Resolving conflicts, adding raw vs rewrite
knassre-bodo 2a24514
Adding raw vs rewrite
knassre-bodo aea501f
Fixing SQL handling and fixtures
knassre-bodo a733022
Resolving conflicts
knassre-bodo a5c3b9c
WIP
knassre-bodo bdde458
Adding more tests
knassre-bodo 4a58775
Adding more tests
knassre-bodo 66f2193
Resolving test updates
knassre-bodo 630c7cc
Adding more tests
knassre-bodo f4c318f
Resolving conflicts [RUN ALL]
knassre-bodo c8ade74
Merge branch 'kian/masked_relational_rewrite' into kian/mask_literal_…
knassre-bodo 857c39e
Updating files
knassre-bodo 89fa4a6
Updating other fails
knassre-bodo ef3ae04
Resolving conflicts
knassre-bodo 85376d7
[RUN CI][RUN SF_MASKED]
knassre-bodo 4bf43ea
Merge branch 'main' into kian/mask_literal_rewrite
knassre-bodo db3ba6f
Resolving conflicts
knassre-bodo 1b601ac
Revision
knassre-bodo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
""" | ||
TODO | ||
""" | ||
|
||
__all__ = ["MaskLiteralComparisonShuttle"] | ||
|
||
import pydough.pydough_operators as pydop | ||
from pydough.relational import ( | ||
CallExpression, | ||
LiteralExpression, | ||
RelationalExpression, | ||
RelationalExpressionShuttle, | ||
) | ||
|
||
|
||
class MaskLiteralComparisonShuttle(RelationalExpressionShuttle): | ||
""" | ||
TODO | ||
""" | ||
|
||
def is_unprotect_call(self, expr: RelationalExpression) -> bool: | ||
""" | ||
TODO | ||
""" | ||
return ( | ||
isinstance(expr, CallExpression) | ||
and isinstance(expr.op, pydop.MaskedExpressionFunctionOperator) | ||
and expr.op.is_unprotect | ||
) | ||
|
||
def protect_literal_comparison( | ||
self, | ||
original_call: CallExpression, | ||
call_arg: CallExpression, | ||
literal_arg: LiteralExpression, | ||
) -> CallExpression: | ||
""" | ||
TODO | ||
""" | ||
if ( | ||
not isinstance(call_arg.op, pydop.MaskedExpressionFunctionOperator) | ||
or not call_arg.op.is_unprotect | ||
): | ||
return original_call | ||
|
||
masked_literal: RelationalExpression | ||
|
||
if original_call.op in (pydop.EQU, pydop.NEQ): | ||
masked_literal = CallExpression( | ||
pydop.MaskedExpressionFunctionOperator( | ||
call_arg.op.masking_metadata, False | ||
), | ||
call_arg.data_type, | ||
[literal_arg], | ||
) | ||
elif original_call.op == pydop.ISIN and isinstance( | ||
literal_arg.value, (list, tuple) | ||
): | ||
masked_literal = LiteralExpression( | ||
[ | ||
CallExpression( | ||
pydop.MaskedExpressionFunctionOperator( | ||
call_arg.op.masking_metadata, False | ||
), | ||
call_arg.data_type, | ||
[LiteralExpression(v, literal_arg.data_type)], | ||
) | ||
for v in literal_arg.value | ||
], | ||
original_call.data_type, | ||
) | ||
else: | ||
return original_call | ||
|
||
return CallExpression( | ||
original_call.op, | ||
original_call.data_type, | ||
[call_arg.inputs[0], masked_literal], | ||
) | ||
|
||
def visit_call_expression( | ||
self, call_expression: CallExpression | ||
) -> RelationalExpression: | ||
if call_expression.op in (pydop.EQU, pydop.NEQ): | ||
if isinstance(call_expression.inputs[0], CallExpression) and isinstance( | ||
call_expression.inputs[1], LiteralExpression | ||
): | ||
call_expression = self.protect_literal_comparison( | ||
call_expression, | ||
call_expression.inputs[0], | ||
call_expression.inputs[1], | ||
) | ||
if isinstance(call_expression.inputs[1], CallExpression) and isinstance( | ||
call_expression.inputs[0], LiteralExpression | ||
): | ||
call_expression = self.protect_literal_comparison( | ||
call_expression, | ||
call_expression.inputs[1], | ||
call_expression.inputs[0], | ||
) | ||
if ( | ||
call_expression.op == pydop.ISIN | ||
and isinstance(call_expression.inputs[0], CallExpression) | ||
and isinstance(call_expression.inputs[1], LiteralExpression) | ||
): | ||
call_expression = self.protect_literal_comparison( | ||
call_expression, call_expression.inputs[0], call_expression.inputs[1] | ||
) | ||
return super().visit_call_expression(call_expression) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
78 changes: 78 additions & 0 deletions
78
pydough/pydough_operators/expression_operators/masked_expression_function_operator.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
""" | ||
TODO | ||
""" | ||
|
||
__all__ = ["MaskedExpressionFunctionOperator"] | ||
|
||
|
||
from pydough.metadata.properties import MaskedTableColumnMetadata | ||
from pydough.pydough_operators.type_inference import ( | ||
ConstantType, | ||
ExpressionTypeDeducer, | ||
RequireNumArgs, | ||
TypeVerifier, | ||
) | ||
|
||
from .expression_function_operators import ExpressionFunctionOperator | ||
|
||
|
||
class MaskedExpressionFunctionOperator(ExpressionFunctionOperator): | ||
""" | ||
TODO | ||
""" | ||
|
||
def __init__( | ||
self, | ||
masking_metadata: MaskedTableColumnMetadata, | ||
is_unprotect: bool, | ||
): | ||
verifier: TypeVerifier = RequireNumArgs(1) | ||
deducer: ExpressionTypeDeducer = ConstantType( | ||
masking_metadata.unprotected_data_type | ||
if is_unprotect | ||
else masking_metadata.data_type | ||
) | ||
super().__init__( | ||
"UNMASK" if is_unprotect else "MASK", False, verifier, deducer, False | ||
) | ||
self._masking_metadata: MaskedTableColumnMetadata = masking_metadata | ||
self._is_unprotect: bool = is_unprotect | ||
|
||
@property | ||
def masking_metadata(self) -> MaskedTableColumnMetadata: | ||
""" | ||
The metadata for the masked column. | ||
""" | ||
return self._masking_metadata | ||
|
||
@property | ||
def is_unprotect(self) -> bool: | ||
""" | ||
Whether this operator is unprotecting (True) or protecting (False). | ||
""" | ||
return self._is_unprotect | ||
|
||
@property | ||
def format_string(self) -> str: | ||
""" | ||
The format string to use for this operator to either mask or unmask the | ||
operand. | ||
""" | ||
return ( | ||
self.masking_metadata.unprotect_protocol | ||
if self.is_unprotect | ||
else self.masking_metadata.protect_protocol | ||
) | ||
|
||
def to_string(self, arg_strings: list[str]) -> str: | ||
name: str = "UNMASK" if self.is_unprotect else "MASK" | ||
arg_strings = [f"[{s}]" for s in arg_strings] | ||
return f"{name}::({self.format_string.format(*arg_strings)})" | ||
|
||
def equals(self, other: object) -> bool: | ||
return ( | ||
isinstance(other, MaskedExpressionFunctionOperator) | ||
and self.masking_metadata == other.masking_metadata | ||
and self.is_unprotect == other.is_unprotect | ||
and super().equals(other) | ||
) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -305,9 +305,22 @@ def visit_window_expression(self, window_expression: WindowCallExpression) -> No | |
def visit_literal_expression(self, literal_expression: LiteralExpression) -> None: | ||
# Note: This assumes each literal has an associated type that can be parsed | ||
# and types do not represent implicit casts. | ||
literal: SQLGlotExpression = sqlglot_expressions.convert( | ||
literal_expression.value | ||
) | ||
literal: SQLGlotExpression | ||
if isinstance(literal_expression.value, (tuple, list)): | ||
# If the literal is a list or tuple, convert each element | ||
# individually and create an array literal. | ||
Comment on lines
+317
to
+318
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is needed because now, with the recent changes, a "list literal" can contain non-literal expressions, e.g. a list of function calls |
||
elements: list[SQLGlotExpression] = [] | ||
for element in literal_expression.value: | ||
element_expr: SQLGlotExpression | ||
if isinstance(element, RelationalExpression): | ||
element.accept(self) | ||
element_expr = self._stack.pop() | ||
else: | ||
element_expr = sqlglot_expressions.convert(element) | ||
elements.append(element_expr) | ||
literal = sqlglot_expressions.Array(expressions=elements) | ||
else: | ||
literal = sqlglot_expressions.convert(literal_expression.value) | ||
|
||
# Special handling: insert cast calls for ansi casting of date/time | ||
# instead of relying on SQLGlot conversion functions. This is because | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -37,14 +37,14 @@ CREATE TABLE TRANSACTIONS ( | |
); | ||
|
||
INSERT INTO CUSTOMERS (c_key, c_fname, c_lname, c_phone, c_email, c_addr, c_birthday) | ||
SELECT * | ||
-- 42 - column1, -- ARITHMETIC SHIFT: 42 | ||
-- UPPER(column2), -- UPPERCASE | ||
-- UPPER(column3), -- UPPERCASE | ||
-- REPLACE(REPLACE(REPLACE(column4, '0', '*'), '9', '0'), '*', '9'), -- DIGIT SWITCH: 0 <-> 9 | ||
-- SUBSTRING(column5, 2) || SUBSTRING(column5, 1, 1), -- FIRST CHAR TRANSPOSE | ||
-- SUBSTRING(column6, 2) || SUBSTRING(column6, 1, 1), -- FIRST CHAR TRANSPOSE | ||
-- DATE(column7, '-472 days') -- DAY SHIFT: 472 | ||
SELECT | ||
42 - column1, -- ARITHMETIC SHIFT: 42 | ||
UPPER(column2), -- UPPERCASE | ||
UPPER(column3), -- UPPERCASE | ||
REPLACE(REPLACE(REPLACE(column4, '0', '*'), '9', '0'), '*', '9'), -- DIGIT SWITCH: 0 <-> 9 | ||
SUBSTRING(column5, 2) || SUBSTRING(column5, 1, 1), -- FIRST CHAR TRANSPOSE | ||
SUBSTRING(column6, 2) || SUBSTRING(column6, 1, 1), -- FIRST CHAR TRANSPOSE | ||
DATE(column7, '-472 days') -- DAY SHIFT: 472 | ||
FROM ( | ||
VALUES | ||
(1, 'alice', 'johnson', '555-123-4567', '[email protected]', '123 Maple St;Portland;OR;97205', '1985-04-12'), | ||
|
@@ -81,13 +81,13 @@ INSERT INTO BRANCHES (b_key, b_name, b_addr) VALUES | |
; | ||
|
||
INSERT INTO ACCOUNTS (a_key, a_custkey, a_branchkey, a_balance, a_type, a_open_ts) | ||
SELECT * | ||
-- CAST(CAST(column1 as TEXT) || CAST(column1 as TEXT) AS INTEGER), | ||
-- column2, | ||
-- column3, | ||
-- column4 * column4, -- GEOMETRIC SHIFT | ||
-- SUBSTRING(column5, 2) || SUBSTRING(column5, 1, 1), -- FIRST CHAR TRANSPOSE | ||
-- DATETIME(column6, '-123456789 seconds') -- SECOND SHIFT: 123456789 | ||
SELECT | ||
CAST(CAST(column1 as TEXT) || CAST(column1 as TEXT) AS INTEGER), | ||
column2, | ||
column3, | ||
column4 * column4, -- GEOMETRIC SHIFT | ||
SUBSTRING(column5, 2) || SUBSTRING(column5, 1, 1), -- FIRST CHAR TRANSPOSE | ||
DATETIME(column6, '-123456789 seconds') -- SECOND SHIFT: 123456789 | ||
FROM ( | ||
VALUES | ||
-- Customer 1 (alice johnson, OR) - 3 accounts | ||
|
@@ -189,12 +189,11 @@ VALUES | |
|
||
INSERT INTO TRANSACTIONS (t_key, t_sourceaccount, t_destaccount, t_amount, t_ts) | ||
SELECT | ||
* | ||
-- column1, | ||
-- column2, | ||
-- column3, | ||
-- 1025.67 - column4, -- ARITHMETIC SHIFT: 1025.67 | ||
-- DATETIME(column5, '-54321 seconds') -- SECOND SHIFT: 54321 | ||
column1, | ||
column2, | ||
column3, | ||
1025.67 - column4, -- ARITHMETIC SHIFT: 1025.67 | ||
DATETIME(column5, '-54321 seconds') -- SECOND SHIFT: 54321 | ||
FROM ( | ||
VALUES | ||
(1, 41, 8, 2753.92, '2019-11-11 18:00:52'), | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.