You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: presto-docs/src/main/sphinx/connector/clp.rst
+74-48Lines changed: 74 additions & 48 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -99,88 +99,114 @@ accordingly.
99
99
Metadata Filter Config File
100
100
----------------------------
101
101
102
-
The metadata filter configuration file allows you to define filters that help the CLP connector determine which splits
103
-
can be pruned early during query execution, improving performance significantly.
102
+
The metadata filter config file allows you to configure the set of columns that can be used to filter out irrelevant
103
+
splits (CLP archives) when querying CLP's metadata database. This can significantly improve performance by reducing the
104
+
amount of data that needs to be scanned. For a given query, the connector will translate any supported filter predicates
105
+
that involve the configured columns into a query against CLP's metadata database.
104
106
105
-
The CLP connector supports metadata filter SQL translation for the following expressions:
107
+
The configuration is a JSON object where each key under the root represents a :ref:`scope<scopes>` and each scope maps
108
+
to an array of :ref:`filter configs<filter-configs>`.
106
109
107
-
- Comparisons between variables and constants (e.g., ``=``, ``!=``, ``<``, ``>``, ``<=``, ``>=``).
108
-
- Dereferencing fields from row-typed variables.
109
-
- Logical operators: ``AND``, ``OR``, and ``NOT``.
110
110
111
-
The configuration is a JSON object where each top-level key represents a *scope* and each scope maps to a list of
112
-
*filters*.
111
+
.. _scopes:
112
+
113
+
Scopes
114
+
^^^^^^
113
115
114
-
The *scope* is in form of:
116
+
A *scope* can be one of the following:
115
117
116
-
- **Catalog-level**: e.g., ``"clp"`` — applies to all schemas and tables under the catalog.
117
-
- **Schema-level**: e.g., ``"clp.default"`` — applies to all tables under the specified catalog and schema.
118
-
- **Table-level**: e.g., ``"clp.default.table_1"`` — applies only to the fullyqualified table ``catalog.schema.table``.
118
+
- A catalog name
119
+
- A fully-qualified schema name
120
+
- A fully-qualified table name
119
121
120
-
Each *filter* includes:
122
+
Filter configs under a particular scope will apply to all child scopes. For example, filter configs at the schema level
123
+
will apply to all tables within that schema.
121
124
122
-
- ``columnName``: must match a column name in the table’s schema.
125
+
.. _filter-configs:
123
126
124
-
**Note:** Only numeric-type columns can currently be used as metadata filters.
127
+
Filter Configs
128
+
^^^^^^^^^^^^^^
125
129
126
-
- ``rangeMapping`` *(optional)*: specifies how the filter should be remapped when it targets metadata-only columns.
130
+
Each `filter config` indicates how a *data column*---a column in the Presto table---should be mapped to a *metadata
131
+
column*---a column in CLP's metadata database. In most cases, the data column and the metadata column will have the same
132
+
name; but in some cases, the data column may be remapped.
127
133
128
-
**Note:** This option is only valid if the column is numeric type.
134
+
For example, an integer data column (e.g., ``timestamp``), may be remapped to a pair of metadata columns that represent
135
+
the range of possible values (e.g., ``begin_timestamp`` and ``end_timestamp``) of the data column within a split.
129
136
130
-
For example, a condition like:
137
+
Each *filter config* has the following properties:
131
138
132
-
::
139
+
- ``columnName``: The data column's name.
133
140
134
-
"msg.timestamp" > 1234 AND "msg.timestamp" < 5678
141
+
.. note:: Currently, only numeric-type columns can be used as metadata filters.
135
142
136
-
will be rewritten as:
143
+
- ``rangeMapping`` *(optional)*: an object with the following properties:
137
144
138
-
::
145
+
.. note:: This option is only valid if the column has a numeric type.
139
146
140
-
"end_timestamp" > 1234 AND "begin_timestamp" < 5678
147
+
- ``lowerBound``: The metadata column that represents the lower bound of values in a split for the data column.
148
+
- ``upperBound``: The metadata column that represents the upper bound of values in a split for the data column.
141
149
142
-
This ensures that metadata-based filtering produces a superset of the actual result.
143
150
144
-
- ``required`` *(optional, default: false)*: marks whether the filter **must** be present in the extracted metadata filter SQL query. If a required filter is missing or cannot be pushed down, the query will be rejected.
151
+
- ``required`` *(optional, defaults to false)*: indicates whether the filter **must** be present in the translated
152
+
metadata filter SQL query. If a required filter is missing or cannot be pushed down, the query will be rejected.
145
153
146
-
Here is an example of a metadata filter config file:
154
+
155
+
Example
156
+
^^^^^^^
157
+
158
+
The code block shows an example metadata filter config file:
147
159
148
160
.. code-block:: json
149
161
150
162
{
151
163
"clp": [
152
-
{
153
-
"columnName": "level"
154
-
}
164
+
{
165
+
"columnName": "level"
166
+
}
155
167
],
156
168
"clp.default": [
157
-
{
158
-
"columnName": "author"
159
-
}
169
+
{
170
+
"columnName": "author"
171
+
}
160
172
],
161
173
"clp.default.table_1": [
162
-
{
163
-
"columnName": "msg.timestamp",
164
-
"rangeMapping": {
165
-
"lowerBound": "begin_timestamp",
166
-
"upperBound": "end_timestamp"
167
-
},
168
-
"required": true
174
+
{
175
+
"columnName": "msg.timestamp",
176
+
"rangeMapping": {
177
+
"lowerBound": "begin_timestamp",
178
+
"upperBound": "end_timestamp"
169
179
},
170
-
{
171
-
"columnName": "file_name"
172
-
}
180
+
"required": true
181
+
},
182
+
{
183
+
"columnName": "file_name"
184
+
}
173
185
]
174
186
}
175
187
176
-
Explanation:
188
+
- The first key-value pair adds the following filter configs for all schemas and tables under the ``clp`` catalog:
189
+
190
+
- The column ``level`` is used as-is without remapping.
191
+
192
+
- The second key-value pair adds the following filter configs for all tables under the ``clp.default`` schema:
177
193
178
-
- ``"clp"``: Adds a filter on the column ``level`` for all schemas and tables under the ``clp`` catalog.
179
-
- ``"clp.default"``: Adds a filter on ``author`` for all tables under the ``clp.default`` schema.
180
-
- ``"clp.default.table_1"``: Adds two filters for the table ``clp.default.table_1``:
194
+
- The column ``author`` is used as-is without remapping.
181
195
182
-
- ``msg.timestamp`` is remapped via ``rangeMapping`` and is marked as **required**.
183
-
- ``file_name`` is used as-is without remapping.
196
+
- The third key-value pair adds two filter configs for the table ``clp.default.table_1``:
197
+
198
+
- The column ``msg.timestamp`` is remapped via a ``rangeMapping`` to the metadata columns ``begin_timestamp`` and
199
+
``end_timestamp``, and is required to exist in every query.
200
+
- The column ``file_name`` is used as-is without remapping.
201
+
202
+
Supported SQL Expressions
203
+
^^^^^^^^^^^^^^^^^^^^^^^^^
204
+
205
+
The connector supports translations from a Presto SQL query to the metadata filter query for the following expressions:
206
+
207
+
- Comparisons between variables and constants (e.g., ``=``, ``!=``, ``<``, ``>``, ``<=``, ``>=``).
208
+
- Dereferencing fields from row-typed variables.
209
+
- Logical operators: ``AND``, ``OR``, and ``NOT``.
0 commit comments