-
Notifications
You must be signed in to change notification settings - Fork 3
feat: Add CLP UDF docs #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release-0.293-clp-connector
Are you sure you want to change the base?
Changes from 5 commits
e750458
d02268d
fed7045
d7d03cd
3ddb9d8
4442f9a
a00bac5
054beb2
41aa302
a7925d0
0796cf6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -320,6 +320,125 @@ Each JSON log maps to this unified ``ROW`` type, with absent fields represented | |||||
| ``status``, ``thread_num``, ``backtrace``) become fields within the ``ROW``, clearly reflecting the nested and varying | ||||||
| structures of the original JSON logs. | ||||||
|
|
||||||
| ************* | ||||||
| CLP Functions | ||||||
| ************* | ||||||
|
|
||||||
| Semi-structured logs can have many potential keys, which can lead to very wide Presto tables. To keep table metadata | ||||||
| concise and still preserve access to dynamic fields, the connector provides two sets of functions that are specific to | ||||||
| the CLP connector. These functions are not part of standard Presto SQL. | ||||||
|
|
||||||
| - JSON path functions (e.g., ``CLP_GET_STRING``) | ||||||
| - Wildcard column matching functions for use in filter predicates (e.g., ``CLP_WILDCARD_STRING_COLUMN``) | ||||||
|
|
||||||
| There is **no performance penalty** when using these functions. During query optimization, the connector rewrites these | ||||||
| functions into references to concrete schema-backed columns or valid symbols in KQL queries. This avoids additional | ||||||
| parsing overhead and delivers performance comparable to querying standard columns. | ||||||
|
|
||||||
| Path-based Functions | ||||||
| ==================== | ||||||
|
|
||||||
| .. function:: CLP_GET_STRING(varchar) -> varchar | ||||||
|
|
||||||
| Returns the string value at the given JSON path, where the column type is one of: ``ClpString``, ``VarString``, or | ||||||
| ``DateString``. Returns a Presto ``VARCHAR``. | ||||||
|
|
||||||
| .. function:: CLP_GET_BIGINT(varchar) -> bigint | ||||||
|
|
||||||
| Returns the integer value at the given JSON path, where the column type is ``Integer``. Returns a Presto ``BIGINT``. | ||||||
|
|
||||||
| .. function:: CLP_GET_DOUBLE(varchar) -> double | ||||||
|
|
||||||
| Returns the double value at the given JSON path, where the column type is ``Float``. Returns a Presto ``DOUBLE``. | ||||||
|
|
||||||
| .. function:: CLP_GET_BOOL(varchar) -> boolean | ||||||
|
|
||||||
| Returns the boolean value at the given JSON path, where the column type is ``Boolean``. Returns a Presto ``BOOLEAN``. | ||||||
|
|
||||||
| .. function:: CLP_GET_STRING_ARRAY(varchar) -> array(varchar) | ||||||
|
|
||||||
| Returns the array value at the given JSON path, where the column type is ``UnstructuredArray`` and converts each | ||||||
| element into a string. Returns a Presto ``ARRAY(VARCHAR)``. | ||||||
|
|
||||||
| .. note:: | ||||||
|
|
||||||
| - JSON paths must be **constant string literals**; variables are not supported. | ||||||
| - Wildcards (e.g., ``msg.*.ts``) are **not supported**. | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
clarity
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe this is supported in CLP-S KQL, which also uses dot notation, but it isn’t supported here. |
||||||
| - If a path is invalid or missing, the function returns ``NULL`` rather than raising an error. | ||||||
|
||||||
|
|
||||||
| Examples | ||||||
| -------- | ||||||
|
|
||||||
| .. code-block:: sql | ||||||
|
|
||||||
| SELECT CLP_GET_STRING(msg.author) AS author | ||||||
| FROM clp.default.table_1 | ||||||
| WHERE CLP_GET_INT('msg.timestamp') > 1620000000; | ||||||
|
|
||||||
coderabbitai[bot] marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
| SELECT CLP_GET_STRING_ARRAY(msg.tags) AS tags | ||||||
| FROM clp.default.table_2 | ||||||
| WHERE CLP_GET_BOOL('msg.is_active') = true; | ||||||
wraymo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
|
|
||||||
| Wildcard Column Functions | ||||||
| ========================= | ||||||
|
|
||||||
| These functions are used to apply filter predicates across all columns of a certain type. They are useful for searching | ||||||
| across unknown or dynamic schemas without specifying exact column names. Similar to the path-based functions, these | ||||||
| functions are rewritten during query optimization to a KQL query that matches the appropriate columns. | ||||||
|
|
||||||
| .. function:: CLP_WILDCARD_STRING_COLUMN() -> varchar | ||||||
|
|
||||||
| Represents all columns whose CLP types are ``ClpString``, ``VarString``, or ``DateString``. | ||||||
|
|
||||||
| .. function:: CLP_WILDCARD_INT_COLUMN() -> bigint | ||||||
|
|
||||||
| Represents all columns whose CLP type is ``Integer``. | ||||||
|
|
||||||
| .. function:: CLP_WILDCARD_FLOAT_COLUMN() -> double | ||||||
|
|
||||||
| Represents all columns whose CLP type is ``Float``. | ||||||
|
|
||||||
| .. function:: CLP_WILDCARD_BOOL_COLUMN() -> boolean | ||||||
|
|
||||||
| Represents all columns whose CLP type is ``Boolean``. | ||||||
|
|
||||||
| .. note:: | ||||||
|
|
||||||
| - Wildcard functions must appear **only in filter conditions** (`WHERE` clause). They cannot be selected and cannot | ||||||
| be passed as arguments to other functions. | ||||||
| - Supported operators include: | ||||||
|
|
||||||
| :: | ||||||
|
|
||||||
| = (EQUAL) | ||||||
| != (NOT_EQUAL) | ||||||
| < (LESS_THAN) | ||||||
| <= (LESS_THAN_OR_EQUAL) | ||||||
| > (GREATER_THAN) | ||||||
| >= (GREATER_THAN_OR_EQUAL) | ||||||
| LIKE | ||||||
| BETWEEN | ||||||
| IN | ||||||
|
|
||||||
| Use of other operators (e.g., arithmetic or function calls) with wildcard functions is not allowed and will result | ||||||
| in a query error. | ||||||
wraymo marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
|
||||||
| Examples | ||||||
| -------- | ||||||
|
|
||||||
| .. code-block:: sql | ||||||
|
|
||||||
| -- Matches if any string column contains "Beijing" | ||||||
| SELECT * | ||||||
| FROM clp.default.table_1 | ||||||
| WHERE CLP_WILDCARD_STRING_COLUMN() = 'Beijing'; | ||||||
|
|
||||||
| -- Matches if any integer column equals 1 | ||||||
| SELECT * | ||||||
| FROM clp.default.table_2 | ||||||
| WHERE CLP_WILDCARD_INT_COLUMN() = 1; | ||||||
|
|
||||||
| *********** | ||||||
| SQL support | ||||||
| *********** | ||||||
|
|
||||||
Uh oh!
There was an error while loading. Please reload this page.