Skip to content

Commit e750458

Browse files
committed
Add CLP UDF docs
1 parent 971443a commit e750458

File tree

1 file changed

+120
-0
lines changed
  • presto-docs/src/main/sphinx/connector

1 file changed

+120
-0
lines changed

presto-docs/src/main/sphinx/connector/clp.rst

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -306,6 +306,126 @@ Each JSON log maps to this unified ``ROW`` type, with absent fields represented
306306
``status``, ``thread_num``, ``backtrace``) become fields within the ``ROW``, clearly reflecting the nested and varying
307307
structures of the original JSON logs.
308308

309+
CLP Functions
310+
-------------
311+
312+
In semi-structured logs, the number of potential keys can grow significantly, resulting in extremely wide Presto tables
313+
with many columns. To manage this complexity, the metadata provider may expose only a subset of the full schema,
314+
typically the static fields or those most relevant to expected queries.
315+
316+
To enable access to dynamic or less common fields not present in the exposed schema, CLP provides two set of functions
317+
to help users query flexible log schemas while keeping the table metadata definition concise. These functions are only
318+
available in the CLP connector and are not part of standard Presto SQL.
319+
320+
- JSON path functions (e.g., ``CLP_GET_STRING``)
321+
- Wildcard column matching functions for use in filter predicates (e.g., ``CLP_WILDCARD_STRING_COLUMN``)
322+
323+
There is **no performance penalty** for using these functions. During query optimization, they are rewritten into
324+
references to actual schema-backed columns or valid symbols in KQL queries. This avoids additional parsing overhead and
325+
delivers performance comparable to querying standard columns.
326+
327+
Path-Based Functions
328+
^^^^^^^^^^^^^^^^^^^^
329+
330+
.. function:: CLP_GET_STRING(varchar) -> varchar
331+
332+
Returns the string value of the given JSON path, where the column type is one of: ``ClpString``, ``VarString``, or
333+
``DateString``. Returns a Presto ``VARCHAR``.
334+
335+
.. function:: CLP_GET_BIGINT(varchar) -> bigint
336+
337+
Returns the integer value of the given JSON path, where the column type is ``Integer``, Returns a Presto ``BIGINT``.
338+
339+
.. function:: CLP_GET_DOUBLE(varchar) -> double
340+
341+
Returns the double value of the given JSON path, where the column type is ``Float``. Returns a Presto ``DOUBLE``.
342+
343+
.. function:: CLP_GET_BOOL(varchar) -> boolean
344+
345+
Returns the double value of the given JSON path, where the column type is ``Boolean``. Returns a Presto ``BOOLEAN``.
346+
347+
.. function:: CLP_GET_STRING_ARRAY(varchar) -> array(varchar)
348+
349+
Returns the array value of the given JSON path, where the column type is ``UnstructuredArray`` and converts each
350+
element into a string. Returns a Presto ``ARRAY(VARCHAR)``.
351+
352+
.. note::
353+
354+
- JSON paths must be **constant string literals**; variables are not supported.
355+
- Wildcards (e.g., ``msg.*.ts``) are **not supported**.
356+
- If a path is invalid or missing, the function returns ``NULL`` rather than raising an error.
357+
358+
Examples:
359+
360+
.. code-block:: sql
361+
362+
SELECT CLP_GET_STRING(msg.author) AS author
363+
FROM clp.default.table_1
364+
WHERE CLP_GET_INT('msg.timestamp') > 1620000000;
365+
366+
SELECT CLP_GET_STRING_ARRAY(msg.tags) AS tags
367+
FROM clp.default.table_2
368+
WHERE CLP_GET_BOOL('msg.is_active') = true;
369+
370+
371+
Wildcard Column Functions
372+
^^^^^^^^^^^^^^^^^^^^^^^^^
373+
374+
These functions are used to apply filter predicates across all columns of a certain type. They are useful for searching
375+
across unknown or dynamic schemas without specifying exact column names. Similar to the path-based functions, these
376+
functions are rewritten during query optimization to a KQL query that matches the appropriate columns.
377+
378+
.. function:: CLP_WILDCARD_STRING_COLUMN() -> varchar
379+
380+
Represents all columns of CLP types: ``ClpString``, ``VarString``, and ``DateString``.
381+
382+
.. function:: CLP_WILDCARD_INT_COLUMN() -> bigint
383+
384+
Represents all columns of CLP type: ``Integer``.
385+
386+
.. function:: CLP_WILDCARD_FLOAT_COLUMN() -> double
387+
388+
Represents all columns of CLP type: ``Float``.
389+
390+
.. function:: CLP_WILDCARD_BOOL_COLUMN() -> boolean
391+
392+
Represents all columns of CLP type: ``Boolean``.
393+
394+
.. note::
395+
396+
- They must appear **only in filter conditions** (`WHERE` clause). They cannot be selected or passed as arguments
397+
to other functions.
398+
- Supported operators includes:
399+
400+
::
401+
402+
= (EQUAL)
403+
!= (NOT_EQUAL)
404+
< (LESS_THAN)
405+
<= (LESS_THAN_OR_EQUAL)
406+
> (GREATER_THAN)
407+
>= (GREATER_THAN_OR_EQUAL)
408+
LIKE
409+
BETWEEN
410+
IN
411+
412+
Use of other operators (e.g., arithmetic or function calls) with wildcard functions is not allowed and will result
413+
in a query error.
414+
415+
Examples:
416+
417+
.. code-block:: sql
418+
419+
-- Matches if any string column contains "Beijing"
420+
SELECT *
421+
FROM clp.default.table_1
422+
WHERE CLP_WILDCARD_STRING_COLUMN() = 'Beijing';
423+
424+
-- Matches if any integer column equals 1
425+
SELECT *
426+
FROM clp.default.table_2
427+
WHERE CLP_WILDCARD_INT_COLUMN() = 1;
428+
309429
***********
310430
SQL support
311431
***********

0 commit comments

Comments
 (0)