-
-
Notifications
You must be signed in to change notification settings - Fork 8
Description
TLDR: Greatly improve performance and/or fix query failures when OPA is involved.
Starting with Trino 453 one can batch the column mask requests to OPA: trinodb/trino#21997
We should use that instead of the "old" non-batched API to significantly improve performance.
A very quick analysis on a customer cluster gave me this numbers:
cat opa.log | grep '"req_path":' | jq -r '.req_path' | sort | uniq -c
120 /
94 /v1/data/trino/allow
300 /v1/data/trino/batch
6562 /v1/data/trino/columnMask
40 /v1/data/trino/rowFiltersBecause of hammering /v1/data/trino/columnMask, queries failed with Max requests queued per destination 1024 exceeded for HttpDestination[Origin@326cb9e1[http://opa.edw.svc.cluster.local:8081.
This very all queries on a table with ~3k columns
Solution was pretty straight forward, as they customer currently doesn't need column masks:
Set opa.policy.batch-column-masking-uri: http://opa.XXX.svc.cluster.local:8081/v1/data/trino/batchColumnMasks to use batching and provide a "allow-all" rego rule:
package trino
# No columns masking needed
batchColumnMasks = []
Long term we should think if always enabling the columnMask and row filter is a good idea?
Ideally it's user-configurable (it sucks, as configOverrides can not be used to remove properties to disable e.g. row masking).
Tasks:
- Extend @siegfriedweber OPA rego rules to support
batchColumnMasks - Always enable
opa.policy.batch-column-masking-uriin trino-operator. This is a breaking change and needs to be highlighted as such. - OPTIONAL: Users can turn columnMasking and rowFiltering off to improve performance. This way it's also way less breaking, as users can turn off column masking :)