Skip to content

Commit 7b7811b

Browse files
Update presto-docs/src/main/sphinx/connector/clp.rst
Co-authored-by: kirkrodrigues <[email protected]>
1 parent 5b29dca commit 7b7811b

File tree

1 file changed

+74
-48
lines changed
  • presto-docs/src/main/sphinx/connector

1 file changed

+74
-48
lines changed

presto-docs/src/main/sphinx/connector/clp.rst

Lines changed: 74 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -99,88 +99,114 @@ accordingly.
9999
Metadata Filter Config File
100100
----------------------------
101101

102-
The metadata filter configuration file allows you to define filters that help the CLP connector determine which splits
103-
can be pruned early during query execution, improving performance significantly.
102+
The metadata filter config file allows you to configure the set of columns that can be used to filter out irrelevant
103+
splits (CLP archives) when querying CLP's metadata database. This can significantly improve performance by reducing the
104+
amount of data that needs to be scanned. For a given query, the connector will translate any supported filter predicates
105+
that involve the configured columns into a query against CLP's metadata database.
104106

105-
The CLP connector supports metadata filter SQL translation for the following expressions:
107+
The configuration is a JSON object where each key under the root represents a :ref:`scope<scopes>` and each scope maps
108+
to an array of :ref:`filter configs<filter-configs>`.
106109

107-
- Comparisons between variables and constants (e.g., ``=``, ``!=``, ``<``, ``>``, ``<=``, ``>=``).
108-
- Dereferencing fields from row-typed variables.
109-
- Logical operators: ``AND``, ``OR``, and ``NOT``.
110110

111-
The configuration is a JSON object where each top-level key represents a *scope* and each scope maps to a list of
112-
*filters*.
111+
.. _scopes:
112+
113+
Scopes
114+
^^^^^^
113115

114-
The *scope* is in form of:
116+
A *scope* can be one of the following:
115117

116-
- **Catalog-level**: e.g., ``"clp"`` — applies to all schemas and tables under the catalog.
117-
- **Schema-level**: e.g., ``"clp.default"`` — applies to all tables under the specified catalog and schema.
118-
- **Table-level**: e.g., ``"clp.default.table_1"`` — applies only to the fully qualified table ``catalog.schema.table``.
118+
- A catalog name
119+
- A fully-qualified schema name
120+
- A fully-qualified table name
119121

120-
Each *filter* includes:
122+
Filter configs under a particular scope will apply to all child scopes. For example, filter configs at the schema level
123+
will apply to all tables within that schema.
121124

122-
- ``columnName``: must match a column name in the table’s schema.
125+
.. _filter-configs:
123126

124-
**Note:** Only numeric-type columns can currently be used as metadata filters.
127+
Filter Configs
128+
^^^^^^^^^^^^^^
125129

126-
- ``rangeMapping`` *(optional)*: specifies how the filter should be remapped when it targets metadata-only columns.
130+
Each `filter config` indicates how a *data column*---a column in the Presto table---should be mapped to a *metadata
131+
column*---a column in CLP's metadata database. In most cases, the data column and the metadata column will have the same
132+
name; but in some cases, the data column may be remapped.
127133

128-
**Note:** This option is only valid if the column is numeric type.
134+
For example, an integer data column (e.g., ``timestamp``), may be remapped to a pair of metadata columns that represent
135+
the range of possible values (e.g., ``begin_timestamp`` and ``end_timestamp``) of the data column within a split.
129136

130-
For example, a condition like:
137+
Each *filter config* has the following properties:
131138

132-
::
139+
- ``columnName``: The data column's name.
133140

134-
"msg.timestamp" > 1234 AND "msg.timestamp" < 5678
141+
.. note:: Currently, only numeric-type columns can be used as metadata filters.
135142

136-
will be rewritten as:
143+
- ``rangeMapping`` *(optional)*: an object with the following properties:
137144

138-
::
145+
.. note:: This option is only valid if the column has a numeric type.
139146

140-
"end_timestamp" > 1234 AND "begin_timestamp" < 5678
147+
- ``lowerBound``: The metadata column that represents the lower bound of values in a split for the data column.
148+
- ``upperBound``: The metadata column that represents the upper bound of values in a split for the data column.
141149

142-
This ensures that metadata-based filtering produces a superset of the actual result.
143150

144-
- ``required`` *(optional, default: false)*: marks whether the filter **must** be present in the extracted metadata filter SQL query. If a required filter is missing or cannot be pushed down, the query will be rejected.
151+
- ``required`` *(optional, defaults to false)*: indicates whether the filter **must** be present in the translated
152+
metadata filter SQL query. If a required filter is missing or cannot be pushed down, the query will be rejected.
145153

146-
Here is an example of a metadata filter config file:
154+
155+
Example
156+
^^^^^^^
157+
158+
The code block shows an example metadata filter config file:
147159

148160
.. code-block:: json
149161
150162
{
151163
"clp": [
152-
{
153-
"columnName": "level"
154-
}
164+
{
165+
"columnName": "level"
166+
}
155167
],
156168
"clp.default": [
157-
{
158-
"columnName": "author"
159-
}
169+
{
170+
"columnName": "author"
171+
}
160172
],
161173
"clp.default.table_1": [
162-
{
163-
"columnName": "msg.timestamp",
164-
"rangeMapping": {
165-
"lowerBound": "begin_timestamp",
166-
"upperBound": "end_timestamp"
167-
},
168-
"required": true
174+
{
175+
"columnName": "msg.timestamp",
176+
"rangeMapping": {
177+
"lowerBound": "begin_timestamp",
178+
"upperBound": "end_timestamp"
169179
},
170-
{
171-
"columnName": "file_name"
172-
}
180+
"required": true
181+
},
182+
{
183+
"columnName": "file_name"
184+
}
173185
]
174186
}
175187
176-
Explanation:
188+
- The first key-value pair adds the following filter configs for all schemas and tables under the ``clp`` catalog:
189+
190+
- The column ``level`` is used as-is without remapping.
191+
192+
- The second key-value pair adds the following filter configs for all tables under the ``clp.default`` schema:
177193

178-
- ``"clp"``: Adds a filter on the column ``level`` for all schemas and tables under the ``clp`` catalog.
179-
- ``"clp.default"``: Adds a filter on ``author`` for all tables under the ``clp.default`` schema.
180-
- ``"clp.default.table_1"``: Adds two filters for the table ``clp.default.table_1``:
194+
- The column ``author`` is used as-is without remapping.
181195

182-
- ``msg.timestamp`` is remapped via ``rangeMapping`` and is marked as **required**.
183-
- ``file_name`` is used as-is without remapping.
196+
- The third key-value pair adds two filter configs for the table ``clp.default.table_1``:
197+
198+
- The column ``msg.timestamp`` is remapped via a ``rangeMapping`` to the metadata columns ``begin_timestamp`` and
199+
``end_timestamp``, and is required to exist in every query.
200+
- The column ``file_name`` is used as-is without remapping.
201+
202+
Supported SQL Expressions
203+
^^^^^^^^^^^^^^^^^^^^^^^^^
204+
205+
The connector supports translations from a Presto SQL query to the metadata filter query for the following expressions:
206+
207+
- Comparisons between variables and constants (e.g., ``=``, ``!=``, ``<``, ``>``, ``<=``, ``>=``).
208+
- Dereferencing fields from row-typed variables.
209+
- Logical operators: ``AND``, ``OR``, and ``NOT``.
184210

185211
Data Types
186212
----------

0 commit comments

Comments
 (0)