Skip to content

Conversation

@smaye81
Copy link
Contributor

@smaye81 smaye81 commented Mar 25, 2025

This ports the validation logic from protovalidate-go for validating URIs and URI references. As a result, all conformance tests for 0.10.3 related to these validations are now passing.

This uses a single underscore for protected class members and a double-underscore for private class methods to use name-mangling. There are varying recommendations on how to structure these (for example, Effective Python recommends public members, some recommendations suggest a single-underscore for class methods, etc.).

The current pattern in protovalidate-python seems to use a single underscore for module-level functions, which we could certainly adopt for classes. I'm open to whatever is the established pattern that we should adopt.

Comment on lines 164 to 166
"""is_uri validates whether string is a valid URI."""
valid = Uri(str(string)).uri()
return celtypes.BoolType(valid)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guessing this is what was meant by the 'cel types in the function' comment here.

I'm open to structure this however. We could do something like the following (omitting code for brevity):

def validate_uri(string: celtypes.Value) -> celpy.Result:
    if not isinstance(string, celtypes.StringType):
        msg = "invalid argument, expected string"
        raise celpy.CELEvalError(msg)
    return celtypes.BoolType(_is_uri(string))


def _is_uri(string: str) -> bool:
    """is_uri validates whether string is a valid URI."""
    return Uri(str(string)).uri()

...

def make_extra_funcs(locale: str) -> dict[str, celpy.CELFunction]:
    ...
    return {
         ...,
        "isUri": validate_uri,
        "isUriRef": validate_uri_ref,
    }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like celpy.CELFunction is pretty flexible regarding arguments. I think this might work out without the extra layer. I'd go without the extra layer, and revisit if it causes problems.

@smaye81 smaye81 requested review from rodaine and timostamm March 25, 2025 18:59
return celtypes.BoolType(urlparse.unquote(url.query) != url.query)

return celtypes.BoolType(True)
"""is_uri validates whether string is a valid URI."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a very helpful comment. Is there a reason not to use the same doc as the source implementation?

Suggested change
"""is_uri validates whether string is a valid URI."""
"""Returns true if the string is an email address, for example "[email protected]".
URI is defined in the internet standard RFC 3986.
Zone Identifiers in IPv6 address literals are supported (RFC 6874).
"""
  • Gives an example so it's immediately clear what this is about.
  • References the specification.
  • Explains noteworthy details (zone id).

It's helpful information for users and maintainers.

Same for all other functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually copying docs from the Go implementation (which are just as unhelpful), so we should prob update that too.

Fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, should update.

The current version of is_ip raises an error if a version other than 4 or 6 is given. The reference implementation is documented with:

Version 0 means either 4 or 6. Passing a version other than 0, 4, or 6 always returns false.

We want to nail down the behavior in those edge cases. That's why the docs are important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yep. Good catch. I will update that in the forthcoming PR for porting is_ip logic.

@smaye81 smaye81 requested a review from timostamm March 26, 2025 14:55
return celtypes.BoolType(False)
elif not all([url.scheme, url.netloc, url.path]):
return celtypes.BoolType(False)
"""Validate whether string is a valid URI.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to use the same doc as the source implementation?

/**
 * Returns true if the string is a URI, for example "https://example.com/foo/bar?baz=quux#frag".
 *
 * URI is defined in the internet standard RFC 3986.
 * Zone Identifiers in IPv6 address literals are supported (RFC 6874).
 */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this I was following that PEP proposal:

The docstring is a phrase ending in a period. It prescribes the function or method’s effect as a command (“Do this”, “Return that”), not as a description; e.g. don’t write “Returns the pathname …”.

I suppose we could change this to say 'Return true if the string...', but since we're documenting return types in the Returns: section, that doesn't make a lot of sense. I don't think the comments across implementations should be verbatim since each language is going to have its own guidelines and styles. As long as it expresses the exact same information. For example, the full description below this line includes all the RFC info.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair to follow the PEP on phrasing 👍

Comment on lines +178 to +185
Args:
string (celTypes.Value): The string to validate.
# If the query string contains percent-encoding, then try to decode it.
# unquote will return the same string if it is improperly encoded.
if "%" in url.query:
return celtypes.BoolType(urlparse.unquote(url.query) != url.query)
Returns:
True if the string is a URI, for example "https://example.com/foo/bar?baz=quux#frag". False otherwise.
return celtypes.BoolType(True)
Raises:
celpy.CELEvalError: If string is not an instance of celtypes.StringType.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it typical in Python to document arguments, return value, and exceptions like this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, from the PEP proposal you linked:

The docstring for a function or method should summarize its behavior and document its arguments, return value(s), side effects, exceptions raised, and restrictions on when it can be called (all if applicable). Optional arguments should be indicated. It should be documented whether keyword arguments are part of the interface.

There are a few styles for doing this. This one uses the Google style guide, which is also used in Effective Python.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we're also using a similar style in validator.py.

We definitely want to follow best practices, but there's some leeway. Don't need to completely rewrite the docs for every port.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, fair enough. Will keep that in mind going forward.

@smaye81 smaye81 merged commit 12fa3e5 into sayers/validation_upgrades Mar 26, 2025
7 checks passed
@smaye81 smaye81 deleted the sayers/isuri branch March 26, 2025 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants