-
Notifications
You must be signed in to change notification settings - Fork 9
Enhance is_uri and is_uri_reference validation
#271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
protovalidate/internal/extra_func.py
Outdated
| """is_uri validates whether string is a valid URI.""" | ||
| valid = Uri(str(string)).uri() | ||
| return celtypes.BoolType(valid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guessing this is what was meant by the 'cel types in the function' comment here.
I'm open to structure this however. We could do something like the following (omitting code for brevity):
def validate_uri(string: celtypes.Value) -> celpy.Result:
if not isinstance(string, celtypes.StringType):
msg = "invalid argument, expected string"
raise celpy.CELEvalError(msg)
return celtypes.BoolType(_is_uri(string))
def _is_uri(string: str) -> bool:
"""is_uri validates whether string is a valid URI."""
return Uri(str(string)).uri()
...
def make_extra_funcs(locale: str) -> dict[str, celpy.CELFunction]:
...
return {
...,
"isUri": validate_uri,
"isUriRef": validate_uri_ref,
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like celpy.CELFunction is pretty flexible regarding arguments. I think this might work out without the extra layer. I'd go without the extra layer, and revisit if it causes problems.
protovalidate/internal/extra_func.py
Outdated
| return celtypes.BoolType(urlparse.unquote(url.query) != url.query) | ||
|
|
||
| return celtypes.BoolType(True) | ||
| """is_uri validates whether string is a valid URI.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not a very helpful comment. Is there a reason not to use the same doc as the source implementation?
| """is_uri validates whether string is a valid URI.""" | |
| """Returns true if the string is an email address, for example "[email protected]". | |
| URI is defined in the internet standard RFC 3986. | |
| Zone Identifiers in IPv6 address literals are supported (RFC 6874). | |
| """ |
- Gives an example so it's immediately clear what this is about.
- References the specification.
- Explains noteworthy details (zone id).
It's helpful information for users and maintainers.
Same for all other functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually copying docs from the Go implementation (which are just as unhelpful), so we should prob update that too.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, should update.
The current version of is_ip raises an error if a version other than 4 or 6 is given. The reference implementation is documented with:
Version 0 means either 4 or 6. Passing a version other than 0, 4, or 6 always returns false.
We want to nail down the behavior in those edge cases. That's why the docs are important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yep. Good catch. I will update that in the forthcoming PR for porting is_ip logic.
| return celtypes.BoolType(False) | ||
| elif not all([url.scheme, url.netloc, url.path]): | ||
| return celtypes.BoolType(False) | ||
| """Validate whether string is a valid URI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to use the same doc as the source implementation?
/**
* Returns true if the string is a URI, for example "https://example.com/foo/bar?baz=quux#frag".
*
* URI is defined in the internet standard RFC 3986.
* Zone Identifiers in IPv6 address literals are supported (RFC 6874).
*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this I was following that PEP proposal:
The docstring is a phrase ending in a period. It prescribes the function or method’s effect as a command (“Do this”, “Return that”), not as a description; e.g. don’t write “Returns the pathname …”.
I suppose we could change this to say 'Return true if the string...', but since we're documenting return types in the Returns: section, that doesn't make a lot of sense. I don't think the comments across implementations should be verbatim since each language is going to have its own guidelines and styles. As long as it expresses the exact same information. For example, the full description below this line includes all the RFC info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair to follow the PEP on phrasing 👍
| Args: | ||
| string (celTypes.Value): The string to validate. | ||
| # If the query string contains percent-encoding, then try to decode it. | ||
| # unquote will return the same string if it is improperly encoded. | ||
| if "%" in url.query: | ||
| return celtypes.BoolType(urlparse.unquote(url.query) != url.query) | ||
| Returns: | ||
| True if the string is a URI, for example "https://example.com/foo/bar?baz=quux#frag". False otherwise. | ||
| return celtypes.BoolType(True) | ||
| Raises: | ||
| celpy.CELEvalError: If string is not an instance of celtypes.StringType. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it typical in Python to document arguments, return value, and exceptions like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, from the PEP proposal you linked:
The docstring for a function or method should summarize its behavior and document its arguments, return value(s), side effects, exceptions raised, and restrictions on when it can be called (all if applicable). Optional arguments should be indicated. It should be documented whether keyword arguments are part of the interface.
There are a few styles for doing this. This one uses the Google style guide, which is also used in Effective Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see we're also using a similar style in validator.py.
We definitely want to follow best practices, but there's some leeway. Don't need to completely rewrite the docs for every port.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, fair enough. Will keep that in mind going forward.
This ports the validation logic from
protovalidate-gofor validating URIs and URI references. As a result, all conformance tests for 0.10.3 related to these validations are now passing.This uses a single underscore for protected class members and a double-underscore for private class methods to use name-mangling. There are varying recommendations on how to structure these (for example, Effective Python recommends public members, some recommendations suggest a single-underscore for class methods, etc.).
The current pattern in protovalidate-python seems to use a single underscore for module-level functions, which we could certainly adopt for classes. I'm open to whatever is the established pattern that we should adopt.