-
Notifications
You must be signed in to change notification settings - Fork 190
feat: add simple extension for JSON #887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| %YAML 1.2 | ||
| --- | ||
| urn: extension:io.substrait:json | ||
| types: | ||
| - name: json | ||
| structure: | ||
| content: string | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you envision extending this into something like variant?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't have a ton of experience with variants, only heard about them for the first time during the last sync. Would it be a better approach to wait to handle variants first?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can see that the actual systems will map it their own internal representations -- simply use this as a tag. So BTW, what's the difference between
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The way I understand it, SELECT to_string(data) FROM tableBut in this literal-less context, |
||
| description: >- | ||
| A JSON type representing arbitrary JSON values (objects, arrays, | ||
| strings, numbers, booleans, or null). | ||
|
|
||
| scalar_functions: | ||
| - name: "parse_json" | ||
| description: >- | ||
| Parse a JSON string into a JSON value. | ||
| impls: | ||
| - args: | ||
| - name: json_string | ||
| value: string | ||
| options: | ||
| on_error: | ||
| description: Controls behavior when input is not valid JSON | ||
benbellick marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| values: [ ERROR, "NULL" ] | ||
| return: u!json? | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Including a |
||
|
|
||
| - name: "to_string" | ||
| description: "Convert a JSON value to its string representation" | ||
| impls: | ||
| - args: | ||
| - name: json_value | ||
| value: u!json | ||
| return: string | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can this return null somehow? Also do we want to have some sort of formatting options like indentation and etc.?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These are good points, though I am unsure of the answer. On the one hand, I can see a case for assuming that values saved as JSON are already valid, but on the other hand, this may not actually be the case. As for formatting options, that is definitely a good idea. I'm not sure what the right approach is to cover many general use cases without more research.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just checking whether you have thought about it. :smile Finding an intersection across popular implementations would be an interesting survey but I think we may end up with some sort of property bag that we define some standard options and the undefined are up to the implementations... Something along that line. 100% agree that no need to add at this point.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO it might make sense to not include this in a first pass. On one hand |
||
|
|
||
| - name: "json_extract" | ||
| description: >- | ||
| Extract a value from JSON using a JSONPath expression. | ||
yongchul marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| JSONPath expressions should follow RFC 9535 (https://datatracker.ietf.org/doc/html/rfc9535). | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh yay, there's a standard for this, though it appears to be in progress and not final. It does feel like the most neutral path format to pick as I imagine most systems have their own flavours of it, which will likely represent a compatibility challenge much like regex has been. |
||
| impls: | ||
| - args: | ||
| - name: json_value | ||
| value: u!json | ||
| - name: path | ||
| value: string | ||
| options: | ||
| on_invalid_path: | ||
| description: Controls behavior when the JSONPath expression is syntactically invalid | ||
| values: [ ERROR, "NULL", UNDEFINED ] | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Velox once encountered three different JSON libraries. One actually succeeded despite not checking validity because it only read the part of the JSON that mattered.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Was that the validity of the JSON, or the validity of the JSON path? Though if I think on it, I can imagine both "working". Lazy evaluation of a valid path on malformed JSON that never hits the bad json, and lazy evaluation of an invalid path on JSON that short-circuits the bad parts of the path. |
||
| on_path_not_found: | ||
| description: Controls behavior when the path does not exist in the JSON document | ||
| values: [ ERROR, "NULL" ] | ||
| return: u!json? | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
|
|
||
| - name: "is_json_valid" | ||
| description: >- | ||
| Returns true if the input string is valid JSON, false otherwise. | ||
| This function does not parse the JSON, only validates syntax. | ||
| impls: | ||
| - args: | ||
| - name: json_string | ||
| value: string | ||
| return: boolean | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few more functions I could imagine including, e.g. json_each, json_keys, etc. but I wanted to start with a small set of primitives.