-
Notifications
You must be signed in to change notification settings - Fork 3
feat: Add CLP UDF docs #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release-0.293-clp-connector
Are you sure you want to change the base?
Changes from 1 commit
e750458
d02268d
fed7045
d7d03cd
3ddb9d8
4442f9a
a00bac5
054beb2
41aa302
a7925d0
0796cf6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -306,6 +306,126 @@ Each JSON log maps to this unified ``ROW`` type, with absent fields represented | |||||||||
| ``status``, ``thread_num``, ``backtrace``) become fields within the ``ROW``, clearly reflecting the nested and varying | ||||||||||
| structures of the original JSON logs. | ||||||||||
|
|
||||||||||
| CLP Functions | ||||||||||
| ------------- | ||||||||||
|
|
||||||||||
| In semi-structured logs, the number of potential keys can grow significantly, resulting in extremely wide Presto tables | ||||||||||
| with many columns. To manage this complexity, the metadata provider may expose only a subset of the full schema, | ||||||||||
| typically the static fields or those most relevant to expected queries. | ||||||||||
|
|
||||||||||
| To enable access to dynamic or less common fields not present in the exposed schema, CLP provides two set of functions | ||||||||||
| to help users query flexible log schemas while keeping the table metadata definition concise. These functions are only | ||||||||||
| available in the CLP connector and are not part of standard Presto SQL. | ||||||||||
wraymo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
|
|
||||||||||
| - JSON path functions (e.g., ``CLP_GET_STRING``) | ||||||||||
| - Wildcard column matching functions for use in filter predicates (e.g., ``CLP_WILDCARD_STRING_COLUMN``) | ||||||||||
|
|
||||||||||
| There is **no performance penalty** for using these functions. During query optimization, they are rewritten into | ||||||||||
| references to actual schema-backed columns or valid symbols in KQL queries. This avoids additional parsing overhead and | ||||||||||
| delivers performance comparable to querying standard columns. | ||||||||||
wraymo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
|
|
||||||||||
| Path-Based Functions | ||||||||||
| ^^^^^^^^^^^^^^^^^^^^ | ||||||||||
wraymo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
|
|
||||||||||
| .. function:: CLP_GET_STRING(varchar) -> varchar | ||||||||||
|
|
||||||||||
| Returns the string value of the given JSON path, where the column type is one of: ``ClpString``, ``VarString``, or | ||||||||||
wraymo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| ``DateString``. Returns a Presto ``VARCHAR``. | ||||||||||
|
|
||||||||||
| .. function:: CLP_GET_BIGINT(varchar) -> bigint | ||||||||||
|
|
||||||||||
| Returns the integer value of the given JSON path, where the column type is ``Integer``, Returns a Presto ``BIGINT``. | ||||||||||
wraymo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
|
|
||||||||||
| .. function:: CLP_GET_DOUBLE(varchar) -> double | ||||||||||
|
|
||||||||||
| Returns the double value of the given JSON path, where the column type is ``Float``. Returns a Presto ``DOUBLE``. | ||||||||||
wraymo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
|
|
||||||||||
| .. function:: CLP_GET_BOOL(varchar) -> boolean | ||||||||||
|
|
||||||||||
| Returns the double value of the given JSON path, where the column type is ``Boolean``. Returns a Presto ``BOOLEAN``. | ||||||||||
wraymo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
|
|
||||||||||
coderabbitai[bot] marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
| .. function:: CLP_GET_STRING_ARRAY(varchar) -> array(varchar) | ||||||||||
|
|
||||||||||
| Returns the array value of the given JSON path, where the column type is ``UnstructuredArray`` and converts each | ||||||||||
wraymo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| element into a string. Returns a Presto ``ARRAY(VARCHAR)``. | ||||||||||
|
|
||||||||||
| .. note:: | ||||||||||
|
|
||||||||||
| - JSON paths must be **constant string literals**; variables are not supported. | ||||||||||
| - Wildcards (e.g., ``msg.*.ts``) are **not supported**. | ||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
clarity
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe this is supported in CLP-S KQL, which also uses dot notation, but it isn’t supported here. |
||||||||||
| - If a path is invalid or missing, the function returns ``NULL`` rather than raising an error. | ||||||||||
|
||||||||||
|
|
||||||||||
| Examples: | ||||||||||
wraymo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
|
|
||||||||||
| .. code-block:: sql | ||||||||||
|
|
||||||||||
| SELECT CLP_GET_STRING(msg.author) AS author | ||||||||||
| FROM clp.default.table_1 | ||||||||||
| WHERE CLP_GET_INT('msg.timestamp') > 1620000000; | ||||||||||
|
|
||||||||||
coderabbitai[bot] marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| SELECT CLP_GET_STRING_ARRAY(msg.tags) AS tags | ||||||||||
| FROM clp.default.table_2 | ||||||||||
| WHERE CLP_GET_BOOL('msg.is_active') = true; | ||||||||||
wraymo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
|
|
||||||||||
|
|
||||||||||
| Wildcard Column Functions | ||||||||||
| ^^^^^^^^^^^^^^^^^^^^^^^^^ | ||||||||||
|
||||||||||
| Wildcard Column Functions | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| Wildcard column functions | |
| ========================= |
heading level and capitalization
Uh oh!
There was an error while loading. Please reload this page.