-
Notifications
You must be signed in to change notification settings - Fork 28
feat: add initial extension support #395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
benbellick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments but still haven't gone through most of the PR. Just wanted to flush what I have so far, but I will review more later. Thanks!
benbellick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few more comments, but excited to get this in! Sorry that this PR has been sitting so long 😅
Refactored ProtoContext trait (removed), by replacing with a concrete type. Some other small refactoring simplifications and code cleanup.
5fae289 to
d3e3627
Compare
benbellick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few more comments. Had to run so I didn't get to entirely finish the review but will revisit later. Thanks
| /// Sub-second precision digits | ||
| precision: i32, | ||
| }, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for these wonderful comments 🙏
Add TODO referencing Substrait type proto for unsupported type parameters not yet implemented.
Use a single type for non-recursive builtin types.
Also change get_simple_extension_urn to just return an option, and update some comments.
benbellick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I once again have to run before getting to fully finish everything. I actually feel quite confident in everything except for the types.rs file, which is the one file I just didn't get to finish going through. But everything is looking great!
Thank you for bearing with my (sometimes excessive) comments 😄
|
|
||
| #[test] | ||
| fn test_user_defined_and_parameters() { | ||
| let cases = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome tests, these inspire a lot of confidence in the parser.
benbellick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few more comments, thanks!
| parsed_type.visit_references(&mut link); | ||
| let concrete = ConcreteType::try_from(parsed_type)?; | ||
|
|
||
| // Structure representation cannot be nullable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessarily the case that calling parse on RawType is always for parsing structure? If it is, It isn't entirely obvious to me from looking at this code block alone. For example, consider the below excerpt from an extension file:
urn: extension:io.substrait:functions_boolean
scalar_functions:
-
name: or
description: ...
impls:
- args:
- value: boolean?
name: a
variadic:
min: 0
return: boolean?would we then later need to call this parse implementation on "boolean?"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I misunderstood. The structured representation of a type always means that it is non nullable. Marking this one as resolved. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, actually unresolving this one. I believe this is incorrect. My read of the schema is that: a string representation can be nullable, but not the object based representation (which is the case below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok, I looked at this more carefully, and I think the check was in the wrong place, and I moved it.
My interpretation was that if a structure is specified, it cannot ever be nullable, whether specified by string or YAML - because the type itself is the non-nullable version, and the nullability should always go outside the type. As in, if I define a type count with structure i64, then count? is the same as i64?; but if I define count: i64?, then... what is count?? Is that a nullable nullable i64?
I may be over-thinking / over-interpreting this, but see the test / its comment in 02227c4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repeating the schema definition here:
$defs:
type:
oneOf:
- type: string # any data type
- type: object # syntactic sugar for a non-nullable named structAh, my interpretation is different. Any function can return any type, so for example we can have a function return STRUCT?<f1:i32?, f2:i64> (a nullable struct). But someone introduced this "structured" representation in yaml files merely as syntactic surgar, so that you can do:
structure: # equivalent to struct<i32?: f2:i64>
f1: i32?
f2: i64But there just isn't syntax available ATM to say whether or not the above struct is nullable. So the convention is that if you optionally choose to use the structured representation, then you just can't have a nullable struct simply because it cannot be expressed. But if you wanted your function to return a nullable struct, that is perfectly valid. You should have to use the string-based representation.
That is my understanding at least 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some discussion, I think we have a good plan now:
- There is some ambiguity in the spec to clarify; this is now split out in docs: clarify nullable-ness of string representation of UDT structure substrait#920.
- In the meantime - I'll remove this check from the code, and put in a comment pointing to this issue; we should clarify the spec before adding enforcement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done - see latest commit. I removed the relevant test, as it asserted that a nullable type should not be allowed, which it now is. I could have changed it to #[ignore] instead, but it didn't seem worth keeping around.
Replace Parse impls for proto versions with TryFrom, as they don't need a Context.
Applied some good suggested changes from code review.
Addressing more PR comments.
9853681 to
09b7939
Compare
Use IndexMap for YAML object parsing to keep simple extension struct fields in declared order and flag duplicates, aligning with Substrait expectations.
benbellick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now just a few more small comments, but I think after this it is good to go! Thanks for your patience 🙇
| parsed_type.visit_references(&mut link); | ||
| let concrete = ConcreteType::try_from(parsed_type)?; | ||
|
|
||
| // Structure representation cannot be nullable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, actually unresolving this one. I believe this is incorrect. My read of the schema is that: a string representation can be nullable, but not the object based representation (which is the case below).
Addressing PR feedback by tightening the API and docs.
…arse Disallow nullable structure representations when parsing SimpleExtensionsTypesItem so errors surface before conversion, and adjust tests to cover full item parsing.
# Files: # src/parse/text/simple_extensions/types.rs - reject nullable structure types; add test # Edit header/body then save/quit to commit
benbellick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!! 🎉 thanks for bearing with all of my reviews :)
| //! | ||
| //! This module provides registries for Substrait extensions: | ||
| //! - **Global Registry**: Immutable, reusable across plans, URI+name based lookup | ||
| //! - **Local Registry**: Per-plan, anchor-based, references Global Registry (TODO) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (and probably opinionated): I wonder if it is worth dropping the line about Local Registry? I could imagine this being a different struct/file in the future. But I'm not bothered by just leaving it in.
Summary
Begins addressing #367 and #342 by adding support for parsing the types in YAML Simple Extension Files into Rustic types - with validity enforced. This includes a string text parser handling built-in types, compound types, named structs, custom types, and validated parameter constraints in the Simple Extension YAML files.
Scope
ExtensionFile,Registry,CustomType,ConcreteType) and enforces validation of those on creation / read.Key Changes
BuiltinType,CompoundType,ConcreteType,CustomTypewith Display/round‑trip support for alias and named‑struct structures.TryFrom<TypeParamDefsItem>,Parse<RawType>)ProtoContextfromContext, to distinguish between things needed for Protobuf parsing (ProtoContext)u!Name) and type variables; visits extension references for linkage bookkeeping.parsefeature includesserde_yaml;include!(extensions.in)is gated behindextensionsfeature.actions/checkoutto v4, updates Cargo dependency set, and bumps thesubstraitsubmodule.Compatibility Notes
ProtoContexton proto parsing that previously required onlyContext.extensions.innow compiled only withfeatures=["extensions"].Testing
features=["extensions"].