-
Notifications
You must be signed in to change notification settings - Fork 3
feat: Add CLP UDF docs #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release-0.293-clp-connector
Are you sure you want to change the base?
Changes from 6 commits
e750458
d02268d
fed7045
d7d03cd
3ddb9d8
4442f9a
a00bac5
054beb2
41aa302
a7925d0
0796cf6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -320,6 +320,160 @@ Each JSON log maps to this unified ``ROW`` type, with absent fields represented | |||||
| ``status``, ``thread_num``, ``backtrace``) become fields within the ``ROW``, clearly reflecting the nested and varying | ||||||
| structures of the original JSON logs. | ||||||
|
|
||||||
| ************* | ||||||
| CLP Functions | ||||||
| ************* | ||||||
|
|
||||||
| Semi-structured logs can have many potential keys, which can lead to very wide Presto tables. To keep table metadata | ||||||
| concise and still preserve access to dynamic fields, the connector provides three sets of functions that are specific to | ||||||
| the CLP connector. These functions are not part of standard Presto SQL. | ||||||
|
|
||||||
| - JSON path functions (e.g., ``CLP_GET_STRING``) | ||||||
| - Wildcard column matching functions for use in filter predicates (e.g., ``CLP_WILDCARD_STRING_COLUMN``) | ||||||
| - A function for retrieving the entire row in JSON format (i.e., ``CLP_GET_JSON_STRING``) | ||||||
|
|
||||||
| For the first two sets of functions, there is **no performance overhead**. During query optimization, the connector | ||||||
| rewrites these functions into references to concrete schema-backed columns or valid symbols in KQL queries. This avoids | ||||||
| unnecessary parsing overhead and delivers performance comparable to querying standard columns. | ||||||
|
|
||||||
| Path-based Functions | ||||||
| ==================== | ||||||
|
|
||||||
| .. function:: CLP_GET_STRING(varchar) -> varchar | ||||||
|
|
||||||
| Returns the string value at the given JSON path, where the column type is one of: ``ClpString``, ``VarString``, or | ||||||
| ``DateString``. Returns a Presto ``VARCHAR``. | ||||||
|
|
||||||
| .. function:: CLP_GET_BIGINT(varchar) -> bigint | ||||||
|
|
||||||
| Returns the integer value at the given JSON path, where the column type is ``Integer``. Returns a Presto ``BIGINT``. | ||||||
|
|
||||||
| .. function:: CLP_GET_DOUBLE(varchar) -> double | ||||||
|
|
||||||
| Returns the double value at the given JSON path, where the column type is ``Float``. Returns a Presto ``DOUBLE``. | ||||||
|
|
||||||
| .. function:: CLP_GET_BOOL(varchar) -> boolean | ||||||
|
|
||||||
| Returns the boolean value at the given JSON path, where the column type is ``Boolean``. Returns a Presto ``BOOLEAN``. | ||||||
|
|
||||||
| .. function:: CLP_GET_STRING_ARRAY(varchar) -> array(varchar) | ||||||
|
|
||||||
| Returns the array value at the given JSON path, where the column type is ``UnstructuredArray`` and converts each | ||||||
| element into a string. Returns a Presto ``ARRAY(VARCHAR)``. | ||||||
|
|
||||||
| .. note:: | ||||||
|
|
||||||
| - JSON paths must be **constant string literals**; variables are not supported. | ||||||
| - Wildcards (e.g., ``msg.*.ts``) are **not supported**. | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
clarity
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe this is supported in CLP-S KQL, which also uses dot notation, but it isn’t supported here. |
||||||
| - If a path is invalid or missing, the function returns ``NULL`` rather than raising an error. | ||||||
|
||||||
|
|
||||||
| Examples | ||||||
| -------- | ||||||
|
|
||||||
| .. code-block:: sql | ||||||
|
|
||||||
| SELECT CLP_GET_STRING(msg.author) AS author | ||||||
| FROM clp.default.table_1 | ||||||
| WHERE CLP_GET_INT('msg.timestamp') > 1620000000; | ||||||
|
|
||||||
coderabbitai[bot] marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
| SELECT CLP_GET_STRING_ARRAY(msg.tags) AS tags | ||||||
| FROM clp.default.table_2 | ||||||
| WHERE CLP_GET_BOOL('msg.is_active') = true; | ||||||
wraymo marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
|
|
||||||
| Wildcard Column Functions | ||||||
| ========================= | ||||||
|
|
||||||
| These functions are used to apply filter predicates across all columns of a certain type. They are useful for searching | ||||||
| across unknown or dynamic schemas without specifying exact column names. Similar to the path-based functions, these | ||||||
| functions are rewritten during query optimization to a KQL query that matches the appropriate columns. | ||||||
|
|
||||||
| .. function:: CLP_WILDCARD_STRING_COLUMN() -> varchar | ||||||
|
|
||||||
| Represents all columns whose CLP types are ``ClpString``, ``VarString``, or ``DateString``. | ||||||
|
|
||||||
| .. function:: CLP_WILDCARD_INT_COLUMN() -> bigint | ||||||
|
|
||||||
| Represents all columns whose CLP type is ``Integer``. | ||||||
|
|
||||||
| .. function:: CLP_WILDCARD_FLOAT_COLUMN() -> double | ||||||
|
|
||||||
| Represents all columns whose CLP type is ``Float``. | ||||||
|
|
||||||
| .. function:: CLP_WILDCARD_BOOL_COLUMN() -> boolean | ||||||
|
|
||||||
| Represents all columns whose CLP type is ``Boolean``. | ||||||
|
|
||||||
| .. note:: | ||||||
|
|
||||||
| - Wildcard functions must appear **only in filter conditions** (`WHERE` clause). They cannot be selected and cannot | ||||||
| be passed as arguments to other functions. | ||||||
| - Supported operators include: | ||||||
|
|
||||||
| :: | ||||||
|
|
||||||
| = (EQUAL) | ||||||
| != (NOT_EQUAL) | ||||||
| < (LESS_THAN) | ||||||
| <= (LESS_THAN_OR_EQUAL) | ||||||
| > (GREATER_THAN) | ||||||
| >= (GREATER_THAN_OR_EQUAL) | ||||||
| LIKE | ||||||
| BETWEEN | ||||||
| IN | ||||||
|
|
||||||
| Use of other operators (e.g., arithmetic or function calls) with wildcard functions is not allowed and will result | ||||||
| in a query error. | ||||||
wraymo marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
|
||||||
| Examples | ||||||
| -------- | ||||||
|
|
||||||
| .. code-block:: sql | ||||||
|
|
||||||
| -- Matches if any string column contains "Beijing" | ||||||
| SELECT * | ||||||
| FROM clp.default.table_1 | ||||||
| WHERE CLP_WILDCARD_STRING_COLUMN() = 'Beijing'; | ||||||
|
|
||||||
| -- Matches if any integer column equals 1 | ||||||
| SELECT * | ||||||
| FROM clp.default.table_2 | ||||||
| WHERE CLP_WILDCARD_INT_COLUMN() = 1; | ||||||
|
|
||||||
| JSON String Function | ||||||
| ==================== | ||||||
|
|
||||||
| The ``CLP_GET_JSON_STRING``` function provides a convenient way to retrieve the entire log record—including both | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix RST backtick formatting errors. Lines 446 and 458 have mismatched backticks that will render incorrectly in reStructuredText:
- The ``CLP_GET_JSON_STRING``` function provides a convenient way to retrieve the entire log record—including both
+ The ``CLP_GET_JSON_STRING()`` function provides a convenient way to retrieve the entire log record—including both
- This function can only be used in the list of projected columns in a ``SELECT``` clause to retrieve the complete
+ This function can only be used in the list of projected columns in a ``SELECT`` clause to retrieve the completeAlso applies to: 458-458 🤖 Prompt for AI Agents |
||||||
| schema-backed and dynamic fields—as a single JSON string. This enables users to inspect, debug, or export complete | ||||||
| records in their raw JSON form. | ||||||
|
|
||||||
| Similar to the path-based and wildcard functions, this function is rewritten during query optimization to a special | ||||||
| internal column. During query execution, this column is serialized into a JSON string that represents the full record. | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It doesn't include metadata columns from splits right? Do you think its worth highlighting the distinction between JSON record and metadata columns here? I think it could be confusing to a reader thinking in terms of normal SQL records.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It doesn't. Currently we don't have support of metadata projection, probably can edit it later when we have the support. |
||||||
|
|
||||||
| .. function:: CLP_GET_JSON_STRING() -> varchar | ||||||
|
|
||||||
| Returns the full log record as a JSON string, preserving all schema-backed and dynamic fields. | ||||||
|
|
||||||
| .. note:: | ||||||
|
|
||||||
| This function can only be used in the ``SELECT``` list to retrieve the complete JSON representation of each record. | ||||||
|
||||||
| It cannot be used within filter predicates (``WHERE`` clause) or as an argument to other functions. | ||||||
|
|
||||||
| Examples | ||||||
| -------- | ||||||
|
|
||||||
| .. code-block:: sql | ||||||
|
|
||||||
| -- Retrieve each record as a JSON string | ||||||
| SELECT CLP_GET_JSON_STRING() | ||||||
| FROM clp.default.table_1 | ||||||
| LIMIT 10; | ||||||
|
|
||||||
| -- Retrieve JSON along with selected fields | ||||||
| SELECT timestamp, CLP_GET_JSON_STRING() | ||||||
| FROM clp.default.table_1 | ||||||
| WHERE CLP_WILDCARD_STRING_COLUMN() = 'error'; | ||||||
|
|
||||||
| *********** | ||||||
| SQL support | ||||||
| *********** | ||||||
|
|
||||||
Uh oh!
There was an error while loading. Please reload this page.