Skip to content

Commit 4a28092

Browse files
authored
Add RFC for access control row filtering and column masking (#34)
1 parent e041458 commit 4a28092

File tree

1 file changed

+239
-0
lines changed

1 file changed

+239
-0
lines changed
Lines changed: 239 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,239 @@
1+
# **RFC-0010 for Presto**
2+
3+
See [CONTRIBUTING.md](CONTRIBUTING.md) for instructions on creating your RFC and the process surrounding it.
4+
5+
## [Access Control Row Filters and Column Masks]
6+
7+
Proposers
8+
9+
* Tim Meehan
10+
* Bryan Cutler
11+
12+
## [Related Issues]
13+
14+
Current Proposal <br>
15+
https://github.com/prestodb/presto/issues/24278
16+
17+
Past Discussions and PRs
18+
<br>
19+
https://github.com/prestodb/presto/issues/20572
20+
<br>
21+
https://github.com/prestodb/presto/issues/19041
22+
<br>
23+
https://github.com/prestodb/presto/pull/21913
24+
<br>
25+
https://github.com/prestodb/presto/pull/18119
26+
27+
## Summary
28+
29+
Add access control support for row filtering and column masking, and apply them to queries by rewriting the plan.
30+
31+
## Background
32+
33+
As a part of governance requirements, Presto is needed to support data compliance coming from a set of defined rules. For a
34+
given query, these rules are made into expressions for row filters and column masks. A row filter is used to prevent display of
35+
certain rows with data the user does not have access to, while allowing remaining rows that are allowed. A column mask can be
36+
used to mask or obfuscate sensitive data that a user is forbidden to view, such as a credit card number. The expressions can
37+
then be applied to the query during a rewrite before it is run.
38+
39+
### [Optional] Goals
40+
41+
* Add SPIs to AccessControl classes to allow retrieval of row filters and column masks
42+
* Add functionality for Presto to apply filters and masks to a query
43+
44+
## Proposed Implementation
45+
46+
The proposed implementation follows the design from TrinoDB. The filters and masks are retrieved in the `StatementAnalyzer`
47+
and the query is rewritten with the `RelationPlanner`. For reference, orginal TrinoDB commits:
48+
49+
* Adding support for row filters trinodb/trino@fae3147
50+
* Adding support for column masking trinodb/trino@7e0d88e
51+
52+
### Core SPI
53+
54+
Add methods to access control interfaces to retrieve a list of row filters and columns masks for all relevant columns in a
55+
table.
56+
57+
AccessControl.java
58+
59+
```java
60+
default List<ViewExpression> getRowFilters(TransactionId transactionId, Identity identity, AccessControlContext context, QualifiedObjectName tableName)
61+
{
62+
return Collections.emptyList();
63+
}
64+
65+
default Map<ColumnMetadata, ViewExpression> getColumnMasks(TransactionId transactionId, Identity identity, AccessControlContext context, QualifiedObjectName tableName, List<ColumnMetadata> columns)
66+
{
67+
return Collections.emptyMap();
68+
}
69+
```
70+
71+
ConnectorAccessControl.java
72+
73+
```java
74+
/**
75+
* Get row filters associated with the given table and identity.
76+
* <p>
77+
* Each filter must be a scalar SQL expression of boolean type over the columns in the table.
78+
*
79+
* @return the list of filters, or empty list if not applicable
80+
*/
81+
default List<ViewExpression> getRowFilters(ConnectorTransactionHandle transactionHandle, ConnectorIdentity identity, AccessControlContext context, SchemaTableName tableName)
82+
{
83+
return Collections.emptyList();
84+
}
85+
86+
/**
87+
* Bulk method for getting column masks for a subset of columns in a table.
88+
* <p>
89+
* Each mask must be a scalar SQL expression of a type coercible to the type of the column being masked. The expression
90+
* must be written in terms of columns in the table.
91+
*
92+
* @return a mapping from columns to masks, or an empty map if not applicable. The keys of the return Map are a subset of {@code columns}.
93+
*/
94+
default Map<ColumnMetadata, ViewExpression> getColumnMasks(ConnectorTransactionHandle transactionHandle, ConnectorIdentity identity, AccessControlContext context, SchemaTableName tableName, List<ColumnMetadata> columns)
95+
{
96+
return Collections.emptyMap();
97+
}
98+
```
99+
100+
SystemAccessControl.java
101+
```java
102+
103+
/**
104+
* Get row filters associated with the given table and identity.
105+
* <p>
106+
* Each filter must be a scalar SQL expression of boolean type over the columns in the table.
107+
*
108+
* @return a list of filters, or empty list if not applicable
109+
*/
110+
default List<ViewExpression> getRowFilters(Identity identity, AccessControlContext context, CatalogSchemaTableName tableName)
111+
{
112+
return Collections.emptyList();
113+
}
114+
115+
/**
116+
* Bulk method for getting column masks for a subset of columns in a table.
117+
* <p>
118+
* Each mask must be a scalar SQL expression of a type coercible to the type of the column being masked. The expression
119+
* must be written in terms of columns in the table.
120+
*
121+
* @return a mapping from columns to masks, or an empty map if not applicable. The keys of the return Map are a subset of {@code columns}.
122+
*/
123+
default Map<ColumnMetadata, ViewExpression> getColumnMasks(Identity identity, AccessControlContext context, CatalogSchemaTableName tableName, List<ColumnMetadata> columns)
124+
{
125+
return Collections.emptyMap();
126+
}
127+
```
128+
129+
ViewExpression class to hold a filter/mask expression
130+
131+
```java
132+
public ViewExpression(String identity, Optional<String> catalog, Optional<String> schema, String expression)
133+
```
134+
135+
Analysis.java will hold filters and masks for the table with additional methods
136+
137+
```java
138+
void registerTableForRowFiltering(QualifiedObjectName table, String identity)
139+
140+
boolean hasRowFilter(QualifiedObjectName table, String identity)
141+
142+
void addRowFilter(Table table, Expression filter)
143+
144+
List<Expression> getRowFilters(Table node)
145+
146+
void registerTableForColumnMasking(QualifiedObjectName table, String column, String identity)
147+
148+
boolean hasColumnMask(QualifiedObjectName table, String column, String identity)
149+
150+
void addColumnMask(Table table, String column, Expression mask)
151+
152+
Map<String, Expression> getColumnMasks(Table table)
153+
```
154+
#### Example expressions
155+
156+
Examples of row filter expressions, given a table `orders` with columns `orderkey`, `nationkey`:
157+
158+
- a simple predicate:
159+
```
160+
expression := "orderkey < 10"
161+
```
162+
163+
- a subquery:
164+
```
165+
expression := "EXISTS (SELECT 1 FROM nation WHERE nationkey = orderkey)"
166+
```
167+
168+
A column mask will apply an operation on a specific column, given the column values as input produce the masked output.
169+
- example to nullify values, whatever column this is applied to will produce a NULL value:
170+
```
171+
expression := "NULL"
172+
```
173+
174+
- example to negate a column integer values, when applied to column "custkey":
175+
```
176+
expression := "-custkey"
177+
```
178+
179+
#### Additional information
180+
1. What modules are involved
181+
- `presto-main`
182+
- `presto-spi`
183+
- `presto-analyzer` to hold masks and filters
184+
- `presto-hive` for legacy and sql access control
185+
2. Any new terminologies/concepts/SQL language additions
186+
- NA
187+
3. Method/class/interface contracts which you deem fit for implementation.
188+
- NA
189+
4. Code flow using bullet points or pseudo code as applicable
190+
- During analysis phase, access control apis used to retrieve and analyze row filters and column masks.
191+
- Analyzed filters and masks are stored in `Analysis`.
192+
- `RelationPlanner` will the get the filters and masks from `Analysis` and rewrite a new plan with them applied.
193+
- By default no filters or masks are added and the plan will not be rewritten.
194+
5. Any new user facing metrics that can be shown on CLI or UI.
195+
- NA
196+
197+
### Notes on Table Names with Versioning
198+
199+
The proposed SPI will identify a table resource as a `QualifiedObjectName` that includes
200+
* Catalog name
201+
* Schema name
202+
* Table name
203+
204+
This does not explicitly provide table version information when a connector in use supports versioning.
205+
For now, it is left to the plugin implementation to handle any additional versioning added to the table
206+
name. It is recommended to further discuss the possibility of adding such information to `QualifiedObjectName`
207+
so that the plugin can easily be aware of any table versioning, or schema evolution, when providing
208+
row filters or column masks.
209+
210+
## [Optional] Metrics
211+
212+
This is a 0 to 1 feature and will not have any metrics.
213+
214+
## [Optional] Other Approaches Considered
215+
216+
Discussed at https://github.com/prestodb/presto/pull/21913#issuecomment-2050279419 is an approach to use the existing SPI for connector
217+
optimization to rewrite plan during the optimization phase. The benefits of the proposed design over this approach is that it applies
218+
globally to all connectors. Since it is an existing design that has already been in use, it is known to be working, stable and
219+
conforms with the Trino SPI which will help to ease migration.
220+
221+
## Adoption Plan
222+
223+
- What impact (if any) will there be on existing users? Are there any new session parameters, configurations, SPI updates, client API updates, or SQL grammar?
224+
- No impact to users. SPI additions will include a default to keep exiting behaviour. AccessControl plugin can be used to enable the functionality.
225+
- If we are changing behaviour how will we phase out the older behaviour?
226+
- NA
227+
- If we need special migration tools, describe them here.
228+
- NA
229+
- When will we remove the existing behaviour, if applicable.
230+
- NA
231+
- How should this feature be taught to new and existing users? Basically mention if documentation changes/new blog are needed?
232+
- This feature will be documented in the Presto documentation.
233+
- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?
234+
- NA
235+
236+
## Test Plan
237+
238+
Unit tests will be added to ensure that row filter and column mask expressions can be added to a query and give the expected result. The
239+
`TestingAccessControlManager` will be modified to allow for addition of row filters and column masks to be used in testing.

0 commit comments

Comments
 (0)