@@ -306,6 +306,126 @@ Each JSON log maps to this unified ``ROW`` type, with absent fields represented
306306``status ``, ``thread_num ``, ``backtrace ``) become fields within the ``ROW ``, clearly reflecting the nested and varying
307307structures of the original JSON logs.
308308
309+ CLP Functions
310+ -------------
311+
312+ In semi-structured logs, the number of potential keys can grow significantly, resulting in extremely wide Presto tables
313+ with many columns. To manage this complexity, the metadata provider may expose only a subset of the full schema,
314+ typically the static fields or those most relevant to expected queries.
315+
316+ To enable access to dynamic or less common fields not present in the exposed schema, CLP provides two set of functions
317+ to help users query flexible log schemas while keeping the table metadata definition concise. These functions are only
318+ available in the CLP connector and are not part of standard Presto SQL.
319+
320+ - JSON path functions (e.g., ``CLP_GET_STRING ``)
321+ - Wildcard column matching functions for use in filter predicates (e.g., ``CLP_WILDCARD_STRING_COLUMN ``)
322+
323+ There is **no performance penalty ** for using these functions. During query optimization, they are rewritten into
324+ references to actual schema-backed columns or valid symbols in KQL queries. This avoids additional parsing overhead and
325+ delivers performance comparable to querying standard columns.
326+
327+ Path-Based Functions
328+ ^^^^^^^^^^^^^^^^^^^^
329+
330+ .. function :: CLP_GET_STRING(varchar) -> varchar
331+
332+ Returns the string value of the given JSON path, where the column type is one of: ``ClpString ``, ``VarString ``, or
333+ ``DateString ``. Returns a Presto ``VARCHAR ``.
334+
335+ .. function :: CLP_GET_BIGINT(varchar) -> bigint
336+
337+ Returns the integer value of the given JSON path, where the column type is ``Integer ``, Returns a Presto ``BIGINT ``.
338+
339+ .. function :: CLP_GET_DOUBLE(varchar) -> double
340+
341+ Returns the double value of the given JSON path, where the column type is ``Float ``. Returns a Presto ``DOUBLE ``.
342+
343+ .. function :: CLP_GET_BOOL(varchar) -> boolean
344+
345+ Returns the double value of the given JSON path, where the column type is ``Boolean ``. Returns a Presto ``BOOLEAN ``.
346+
347+ .. function :: CLP_GET_STRING_ARRAY(varchar) -> array(varchar)
348+
349+ Returns the array value of the given JSON path, where the column type is ``UnstructuredArray `` and converts each
350+ element into a string. Returns a Presto ``ARRAY(VARCHAR) ``.
351+
352+ .. note ::
353+
354+ - JSON paths must be **constant string literals **; variables are not supported.
355+ - Wildcards (e.g., ``msg.*.ts ``) are **not supported **.
356+ - If a path is invalid or missing, the function returns ``NULL `` rather than raising an error.
357+
358+ Examples:
359+
360+ .. code-block :: sql
361+
362+ SELECT CLP_GET_STRING(msg.author) AS author
363+ FROM clp.default.table_1
364+ WHERE CLP_GET_INT('msg.timestamp') > 1620000000;
365+
366+ SELECT CLP_GET_STRING_ARRAY(msg.tags) AS tags
367+ FROM clp.default.table_2
368+ WHERE CLP_GET_BOOL('msg.is_active') = true;
369+
370+
371+ Wildcard Column Functions
372+ ^^^^^^^^^^^^^^^^^^^^^^^^^
373+
374+ These functions are used to apply filter predicates across all columns of a certain type. They are useful for searching
375+ across unknown or dynamic schemas without specifying exact column names. Similar to the path-based functions, these
376+ functions are rewritten during query optimization to a KQL query that matches the appropriate columns.
377+
378+ .. function :: CLP_WILDCARD_STRING_COLUMN() -> varchar
379+
380+ Represents all columns of CLP types: ``ClpString ``, ``VarString ``, and ``DateString ``.
381+
382+ .. function :: CLP_WILDCARD_INT_COLUMN() -> bigint
383+
384+ Represents all columns of CLP type: ``Integer ``.
385+
386+ .. function :: CLP_WILDCARD_FLOAT_COLUMN() -> double
387+
388+ Represents all columns of CLP type: ``Float ``.
389+
390+ .. function :: CLP_WILDCARD_BOOL_COLUMN() -> boolean
391+
392+ Represents all columns of CLP type: ``Boolean ``.
393+
394+ .. note ::
395+
396+ - They must appear **only in filter conditions ** (`WHERE ` clause). They cannot be selected or passed as arguments
397+ to other functions.
398+ - Supported operators includes:
399+
400+ ::
401+
402+ = (EQUAL)
403+ != (NOT_EQUAL)
404+ < (LESS_THAN)
405+ <= (LESS_THAN_OR_EQUAL)
406+ > (GREATER_THAN)
407+ >= (GREATER_THAN_OR_EQUAL)
408+ LIKE
409+ BETWEEN
410+ IN
411+
412+ Use of other operators (e.g., arithmetic or function calls) with wildcard functions is not allowed and will result
413+ in a query error.
414+
415+ Examples:
416+
417+ .. code-block :: sql
418+
419+ -- Matches if any string column contains "Beijing"
420+ SELECT *
421+ FROM clp.default.table_1
422+ WHERE CLP_WILDCARD_STRING_COLUMN() = 'Beijing';
423+
424+ -- Matches if any integer column equals 1
425+ SELECT *
426+ FROM clp.default.table_2
427+ WHERE CLP_WILDCARD_INT_COLUMN() = 1;
428+
309429***********
310430SQL support
311431***********
0 commit comments