-
-
Notifications
You must be signed in to change notification settings - Fork 33.6k
gh-67041: Allow to distinguish between empty and not defined URI components #123305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
gh-67041: Allow to distinguish between empty and not defined URI components #123305
Conversation
…I components Changes in the urllib.parse module: * Add option allow_none in urlparse(), urlsplit() and urldefrag(). If it is true, represent not defined components as None instead of an empty string. * Add option keep_empty in urlunparse() and urlunsplit(). If it is true, keep empty non-None components in the resulting string. * Add option keep_empty in the geturl() method of DefragResult, SplitResult, ParseResult and the corresponding bytes counterparts.
7032015 to
a60c9be
Compare
|
It is now ready to review. The status of Unfortunately, these objects now have The long term plan is to make |
|
I am sorry, I forget to copy the @orsenthil, @barneygale, could you please make a review? |
|
I think that Possible alternatives:
|
|
Maybe |
|
I think it should be To change the default in the future, do you plan on adding a warning first? |
On one hand, a warning will inform everyone about the change (it should be a FutureWarning). You will have to pass explicit True or False to silence it and get your behavior. This how we normally do. On other hand, the warning will unnecessary disturb those who are fine with any behavior. We will discuss this when the time come. |
Currently, missing and empty component are not distinguishable. This parameter will allow to distinguish them. This PR adds also the |
|
Ah your’re right, there is empty component and empty string.
|
Doc/library/urllib.parse.rst
Outdated
|
|
||
| If *keep_empty* is true, empty strings are kept in the result (for example, | ||
| a ``?`` for an empty query), only ``None`` components are omitted. | ||
| This allows to restore the URL that was parsed with option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| This allows to restore the URL that was parsed with option | |
| This allows rebuilding a URL that was parsed with option |
| or on combining URL components into a URL string. | ||
|
|
||
| .. function:: urlparse(urlstring, scheme='', allow_fragments=True) | ||
| .. function:: urlparse(urlstring, scheme=None, allow_fragments=True, *, missing_as_none=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am really sorry, that I am contributing so late; but "missing_as_none=False" is confusing and not intuitive at all to me.
Pretty sure, others who have not participated are going to feel the same.
The function signature and term is not giving a signal on what it is meant to be.
Are you open to new name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am adding @gpshead, as one of the active developers in this area, to get his opinion too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this name. Admittedly I have spent too much time in the past wrangling problems in this library, but the reason it still works for me despite that is that it is a common concept: do you represent the absence of a value distinctly from the base zero/empty version of that type or not? That is what None is for. and missing_as_none is at least explicit in name to indicate that some values may be None. I'm not going to call it pretty but it is "understandable enough" for me. I can't come up with anything that'd be meaningfully better rather than just alternately-understandable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts about missing_as_empty with the opposite semantic? In future, None will be returned for not defined components by default, and you will need to specify missing_as_none=False or missing_as_empty=True to restore the current behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that name should really be missing_as_empty_string at least (we can’t say just “empty”), meaning missing parts returned as empty strings.
What about use_none=False which is short and doesn’t try to be self-explanatory, so people need to read the docs?
|
When you're done making the requested changes, leave the comment: |
gpshead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall I like this, comments are all about how to explain it in the docs.
| or on combining URL components into a URL string. | ||
|
|
||
| .. function:: urlparse(urlstring, scheme='', allow_fragments=True) | ||
| .. function:: urlparse(urlstring, scheme=None, allow_fragments=True, *, missing_as_none=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this name. Admittedly I have spent too much time in the past wrangling problems in this library, but the reason it still works for me despite that is that it is a common concept: do you represent the absence of a value distinctly from the base zero/empty version of that type or not? That is what None is for. and missing_as_none is at least explicit in name to indicate that some values may be None. I'm not going to call it pretty but it is "understandable enough" for me. I can't come up with anything that'd be meaningfully better rather than just alternately-understandable.
| a ``?`` for an empty query), only ``None`` components are omitted. | ||
| This allows to restore the URL that was parsed with option | ||
| ``missing_as_none=True``. | ||
| By default, *keep_empty* is true if *parts* is the result of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is easy to miss this "footnote" about keep_empty not being a simple False default. I think the function signature above should be more clear that it has a non-trivial default value when not explicitly specified. something awkward with a fake descriptive name but indicative that people should read further for details keep_empty=_FALSE_UNLESS_PARTS_IS_A_URLSPLIT_RESULT or similar perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pydoc will output urlunparse(components, *, keep_empty=['not specified']). Is it fine, or you need more descriptive name?
| a ``?`` for an empty query), only ``None`` components are omitted. | ||
| This allows to restore the URL that was parsed with option | ||
| ``missing_as_none=True``. | ||
| By default, *keep_empty* is true if *parts* is the result of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above. make it obvious from the function signature that the default depends on the type of parts.
serhiy-storchaka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your attention.
I would like to take this opportunity to discuss a certain design issue. To make this change more user friendly, we need to keep the status of missing_as_none in the result of urlsplit() and others. So the user will not need to specify keep_empty=True each time they use missing_as_none=True, and change their code again in future. There are two ways to do this:
- Return different subclasses of
SplitResult(and of other 5 classes). This approach already was used to differentiate string/bytes results. This will add 12 new classes. - Add a (hidden) boolean attribute to instances of
SplitResult(and of other 5 classes).
In both cases there are some issues with copying, pickling, etc, which will be resolved in different ways. In distant future, after flipping the default behavior and deprecating and removing the current behavior (if we completely remove it), we may return to the current state of 6 named tuple classes.
What approach do you prefer?
| or on combining URL components into a URL string. | ||
|
|
||
| .. function:: urlparse(urlstring, scheme='', allow_fragments=True) | ||
| .. function:: urlparse(urlstring, scheme=None, allow_fragments=True, *, missing_as_none=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts about missing_as_empty with the opposite semantic? In future, None will be returned for not defined components by default, and you will need to specify missing_as_none=False or missing_as_empty=True to restore the current behavior.
| a ``?`` for an empty query), only ``None`` components are omitted. | ||
| This allows to restore the URL that was parsed with option | ||
| ``missing_as_none=True``. | ||
| By default, *keep_empty* is true if *parts* is the result of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pydoc will output urlunparse(components, *, keep_empty=['not specified']). Is it fine, or you need more descriptive name?
Changes in the
urllib.parsemodule:allow_nonemissing_as_none inurlparse(),urlsplit()andurldefrag(). If it is true, represent not defined components as None instead of an empty string.urlunparse()andurlunsplit(). If it is true, keep empty non-None components in the resulting string. By default it is the same as the allow_none value for the result of theurlparse()andurlsplit()calls.Add option keep_empty in thegeturl()method ofDefragResult,SplitResult,ParseResultand the corresponding bytes counterparts.