|
| 1 | +# **RFC-0010 for Presto** |
| 2 | + |
| 3 | +See [CONTRIBUTING.md](CONTRIBUTING.md) for instructions on creating your RFC and the process surrounding it. |
| 4 | + |
| 5 | +## [Access Control Row Filters and Column Masks] |
| 6 | + |
| 7 | +Proposers |
| 8 | + |
| 9 | +* Tim Meehan |
| 10 | +* Bryan Cutler |
| 11 | + |
| 12 | +## [Related Issues] |
| 13 | + |
| 14 | +Current Proposal <br> |
| 15 | +https://github.com/prestodb/presto/issues/24278 |
| 16 | + |
| 17 | +Past Discussions and PRs |
| 18 | +<br> |
| 19 | +https://github.com/prestodb/presto/issues/20572 |
| 20 | +<br> |
| 21 | +https://github.com/prestodb/presto/issues/19041 |
| 22 | +<br> |
| 23 | +https://github.com/prestodb/presto/pull/21913 |
| 24 | +<br> |
| 25 | +https://github.com/prestodb/presto/pull/18119 |
| 26 | + |
| 27 | +## Summary |
| 28 | + |
| 29 | +Add access control support for row filtering and column masking, and apply them to queries by rewriting the plan. |
| 30 | + |
| 31 | +## Background |
| 32 | + |
| 33 | +As a part of governance requirements, Presto is needed to support data compliance coming from a set of defined rules. For a |
| 34 | +given query, these rules are made into expressions for row filters and column masks. A row filter is used to prevent display of |
| 35 | +certain rows with data the user does not have access to, while allowing remaining rows that are allowed. A column mask can be |
| 36 | +used to mask or obfuscate sensitive data that a user is forbidden to view, such as a credit card number. The expressions can |
| 37 | +then be applied to the query during a rewrite before it is run. |
| 38 | + |
| 39 | +### [Optional] Goals |
| 40 | + |
| 41 | +* Add SPIs to AccessControl classes to allow retrieval of row filters and column masks |
| 42 | +* Add functionality for Presto to apply filters and masks to a query |
| 43 | + |
| 44 | +## Proposed Implementation |
| 45 | + |
| 46 | +The proposed implementation follows the design from TrinoDB. The filters and masks are retrieved in the `StatementAnalyzer` |
| 47 | +and the query is rewritten with the `RelationPlanner`. For reference, orginal TrinoDB commits: |
| 48 | + |
| 49 | +* Adding support for row filters trinodb/trino@fae3147 |
| 50 | +* Adding support for column masking trinodb/trino@7e0d88e |
| 51 | + |
| 52 | +### Core SPI |
| 53 | + |
| 54 | +Add methods to access control interfaces to retrieve a list of row filters and columns masks for all relevant columns in a |
| 55 | +table. |
| 56 | + |
| 57 | +AccessControl.java |
| 58 | + |
| 59 | +```java |
| 60 | + default List<ViewExpression> getRowFilters(TransactionId transactionId, Identity identity, AccessControlContext context, QualifiedObjectName tableName) |
| 61 | + { |
| 62 | + return Collections.emptyList(); |
| 63 | + } |
| 64 | + |
| 65 | + default Map<ColumnMetadata, ViewExpression> getColumnMasks(TransactionId transactionId, Identity identity, AccessControlContext context, QualifiedObjectName tableName, List<ColumnMetadata> columns) |
| 66 | + { |
| 67 | + return Collections.emptyMap(); |
| 68 | + } |
| 69 | +``` |
| 70 | + |
| 71 | +ConnectorAccessControl.java |
| 72 | + |
| 73 | +```java |
| 74 | + /** |
| 75 | + * Get row filters associated with the given table and identity. |
| 76 | + * <p> |
| 77 | + * Each filter must be a scalar SQL expression of boolean type over the columns in the table. |
| 78 | + * |
| 79 | + * @return the list of filters, or empty list if not applicable |
| 80 | + */ |
| 81 | + default List<ViewExpression> getRowFilters(ConnectorTransactionHandle transactionHandle, ConnectorIdentity identity, AccessControlContext context, SchemaTableName tableName) |
| 82 | + { |
| 83 | + return Collections.emptyList(); |
| 84 | + } |
| 85 | + |
| 86 | + /** |
| 87 | + * Bulk method for getting column masks for a subset of columns in a table. |
| 88 | + * <p> |
| 89 | + * Each mask must be a scalar SQL expression of a type coercible to the type of the column being masked. The expression |
| 90 | + * must be written in terms of columns in the table. |
| 91 | + * |
| 92 | + * @return a mapping from columns to masks, or an empty map if not applicable. The keys of the return Map are a subset of {@code columns}. |
| 93 | + */ |
| 94 | + default Map<ColumnMetadata, ViewExpression> getColumnMasks(ConnectorTransactionHandle transactionHandle, ConnectorIdentity identity, AccessControlContext context, SchemaTableName tableName, List<ColumnMetadata> columns) |
| 95 | + { |
| 96 | + return Collections.emptyMap(); |
| 97 | + } |
| 98 | +``` |
| 99 | + |
| 100 | +SystemAccessControl.java |
| 101 | +```java |
| 102 | + |
| 103 | + /** |
| 104 | + * Get row filters associated with the given table and identity. |
| 105 | + * <p> |
| 106 | + * Each filter must be a scalar SQL expression of boolean type over the columns in the table. |
| 107 | + * |
| 108 | + * @return a list of filters, or empty list if not applicable |
| 109 | + */ |
| 110 | + default List<ViewExpression> getRowFilters(Identity identity, AccessControlContext context, CatalogSchemaTableName tableName) |
| 111 | + { |
| 112 | + return Collections.emptyList(); |
| 113 | + } |
| 114 | + |
| 115 | + /** |
| 116 | + * Bulk method for getting column masks for a subset of columns in a table. |
| 117 | + * <p> |
| 118 | + * Each mask must be a scalar SQL expression of a type coercible to the type of the column being masked. The expression |
| 119 | + * must be written in terms of columns in the table. |
| 120 | + * |
| 121 | + * @return a mapping from columns to masks, or an empty map if not applicable. The keys of the return Map are a subset of {@code columns}. |
| 122 | + */ |
| 123 | + default Map<ColumnMetadata, ViewExpression> getColumnMasks(Identity identity, AccessControlContext context, CatalogSchemaTableName tableName, List<ColumnMetadata> columns) |
| 124 | + { |
| 125 | + return Collections.emptyMap(); |
| 126 | + } |
| 127 | +``` |
| 128 | + |
| 129 | +ViewExpression class to hold a filter/mask expression |
| 130 | + |
| 131 | +```java |
| 132 | +public ViewExpression(String identity, Optional<String> catalog, Optional<String> schema, String expression) |
| 133 | +``` |
| 134 | + |
| 135 | +Analysis.java will hold filters and masks for the table with additional methods |
| 136 | + |
| 137 | +```java |
| 138 | +void registerTableForRowFiltering(QualifiedObjectName table, String identity) |
| 139 | + |
| 140 | +boolean hasRowFilter(QualifiedObjectName table, String identity) |
| 141 | + |
| 142 | +void addRowFilter(Table table, Expression filter) |
| 143 | + |
| 144 | +List<Expression> getRowFilters(Table node) |
| 145 | + |
| 146 | +void registerTableForColumnMasking(QualifiedObjectName table, String column, String identity) |
| 147 | + |
| 148 | +boolean hasColumnMask(QualifiedObjectName table, String column, String identity) |
| 149 | + |
| 150 | +void addColumnMask(Table table, String column, Expression mask) |
| 151 | + |
| 152 | +Map<String, Expression> getColumnMasks(Table table) |
| 153 | +``` |
| 154 | +#### Example expressions |
| 155 | + |
| 156 | +Examples of row filter expressions, given a table `orders` with columns `orderkey`, `nationkey`: |
| 157 | + |
| 158 | +- a simple predicate: |
| 159 | +``` |
| 160 | +expression := "orderkey < 10" |
| 161 | +``` |
| 162 | + |
| 163 | +- a subquery: |
| 164 | +``` |
| 165 | +expression := "EXISTS (SELECT 1 FROM nation WHERE nationkey = orderkey)" |
| 166 | +``` |
| 167 | + |
| 168 | +A column mask will apply an operation on a specific column, given the column values as input produce the masked output. |
| 169 | +- example to nullify values, whatever column this is applied to will produce a NULL value: |
| 170 | +``` |
| 171 | +expression := "NULL" |
| 172 | +``` |
| 173 | + |
| 174 | +- example to negate a column integer values, when applied to column "custkey": |
| 175 | +``` |
| 176 | +expression := "-custkey" |
| 177 | +``` |
| 178 | + |
| 179 | +#### Additional information |
| 180 | +1. What modules are involved |
| 181 | + - `presto-main` |
| 182 | + - `presto-spi` |
| 183 | + - `presto-analyzer` to hold masks and filters |
| 184 | + - `presto-hive` for legacy and sql access control |
| 185 | +2. Any new terminologies/concepts/SQL language additions |
| 186 | + - NA |
| 187 | +3. Method/class/interface contracts which you deem fit for implementation. |
| 188 | + - NA |
| 189 | +4. Code flow using bullet points or pseudo code as applicable |
| 190 | + - During analysis phase, access control apis used to retrieve and analyze row filters and column masks. |
| 191 | + - Analyzed filters and masks are stored in `Analysis`. |
| 192 | + - `RelationPlanner` will the get the filters and masks from `Analysis` and rewrite a new plan with them applied. |
| 193 | + - By default no filters or masks are added and the plan will not be rewritten. |
| 194 | +5. Any new user facing metrics that can be shown on CLI or UI. |
| 195 | + - NA |
| 196 | + |
| 197 | +### Notes on Table Names with Versioning |
| 198 | + |
| 199 | +The proposed SPI will identify a table resource as a `QualifiedObjectName` that includes |
| 200 | +* Catalog name |
| 201 | +* Schema name |
| 202 | +* Table name |
| 203 | + |
| 204 | +This does not explicitly provide table version information when a connector in use supports versioning. |
| 205 | +For now, it is left to the plugin implementation to handle any additional versioning added to the table |
| 206 | +name. It is recommended to further discuss the possibility of adding such information to `QualifiedObjectName` |
| 207 | +so that the plugin can easily be aware of any table versioning, or schema evolution, when providing |
| 208 | +row filters or column masks. |
| 209 | + |
| 210 | +## [Optional] Metrics |
| 211 | + |
| 212 | +This is a 0 to 1 feature and will not have any metrics. |
| 213 | + |
| 214 | +## [Optional] Other Approaches Considered |
| 215 | + |
| 216 | +Discussed at https://github.com/prestodb/presto/pull/21913#issuecomment-2050279419 is an approach to use the existing SPI for connector |
| 217 | +optimization to rewrite plan during the optimization phase. The benefits of the proposed design over this approach is that it applies |
| 218 | +globally to all connectors. Since it is an existing design that has already been in use, it is known to be working, stable and |
| 219 | +conforms with the Trino SPI which will help to ease migration. |
| 220 | + |
| 221 | +## Adoption Plan |
| 222 | + |
| 223 | +- What impact (if any) will there be on existing users? Are there any new session parameters, configurations, SPI updates, client API updates, or SQL grammar? |
| 224 | + - No impact to users. SPI additions will include a default to keep exiting behaviour. AccessControl plugin can be used to enable the functionality. |
| 225 | +- If we are changing behaviour how will we phase out the older behaviour? |
| 226 | + - NA |
| 227 | +- If we need special migration tools, describe them here. |
| 228 | + - NA |
| 229 | +- When will we remove the existing behaviour, if applicable. |
| 230 | + - NA |
| 231 | +- How should this feature be taught to new and existing users? Basically mention if documentation changes/new blog are needed? |
| 232 | + - This feature will be documented in the Presto documentation. |
| 233 | +- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC? |
| 234 | + - NA |
| 235 | + |
| 236 | +## Test Plan |
| 237 | + |
| 238 | +Unit tests will be added to ensure that row filter and column mask expressions can be added to a query and give the expected result. The |
| 239 | +`TestingAccessControlManager` will be modified to allow for addition of row filters and column masks to be used in testing. |
0 commit comments