-
-
Notifications
You must be signed in to change notification settings - Fork 33.6k
gh-67041: Allow to distinguish between empty and not defined URI components #123305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 9 commits
a60c9be
a1dbfa6
b50b778
eaa9ce6
e5c31dd
78bdc13
5846bf2
b578c9d
7d59b7e
d025fa8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -50,12 +50,16 @@ | |||||
| The URL parsing functions focus on splitting a URL string into its components, | ||||||
| or on combining URL components into a URL string. | ||||||
|
|
||||||
| .. function:: urlparse(urlstring, scheme='', allow_fragments=True) | ||||||
| .. function:: urlparse(urlstring, scheme=None, allow_fragments=True, *, missing_as_none=False) | ||||||
|
|
||||||
| Parse a URL into six components, returning a 6-item :term:`named tuple`. This | ||||||
| corresponds to the general structure of a URL: | ||||||
| ``scheme://netloc/path;parameters?query#fragment``. | ||||||
| Each tuple item is a string, possibly empty. The components are not broken up | ||||||
| Each tuple item is a string, possibly empty, or ``None`` if | ||||||
| *missing_as_none* is true. | ||||||
| Not defined component are represented an empty string (by default) or | ||||||
| ``None`` if *missing_as_none* is true. | ||||||
| The components are not broken up | ||||||
| into smaller parts (for example, the network location is a single string), and % | ||||||
| escapes are not expanded. The delimiters as shown above are not part of the | ||||||
| result, except for a leading slash in the *path* component, which is retained if | ||||||
|
|
@@ -84,6 +88,12 @@ | |||||
| 80 | ||||||
| >>> o._replace(fragment="").geturl() | ||||||
| 'http://docs.python.org:80/3/library/urllib.parse.html?highlight=params' | ||||||
| >>> urlparse("http://docs.python.org?") | ||||||
| ParseResult(scheme='http', netloc='docs.python.org', | ||||||
| path='', params='', query='', fragment='') | ||||||
| >>> urlparse("http://docs.python.org?", missing_as_none=True) | ||||||
| ParseResult(scheme='http', netloc='docs.python.org', | ||||||
| path='', params=None, query='', fragment=None) | ||||||
|
|
||||||
| Following the syntax specifications in :rfc:`1808`, urlparse recognizes | ||||||
| a netloc only if it is properly introduced by '//'. Otherwise the | ||||||
|
|
@@ -101,47 +111,53 @@ | |||||
| ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html', | ||||||
| params='', query='', fragment='') | ||||||
| >>> urlparse('help/Python.html') | ||||||
| ParseResult(scheme='', netloc='', path='help/Python.html', params='', | ||||||
| query='', fragment='') | ||||||
| ParseResult(scheme='', netloc='', path='help/Python.html', | ||||||
| params='', query='', fragment='') | ||||||
| >>> urlparse('help/Python.html', missing_as_none=True) | ||||||
| ParseResult(scheme=None, netloc=None, path='help/Python.html', | ||||||
| params=None, query=None, fragment=None) | ||||||
|
|
||||||
| The *scheme* argument gives the default addressing scheme, to be | ||||||
| used only if the URL does not specify one. It should be the same type | ||||||
| (text or bytes) as *urlstring*, except that the default value ``''`` is | ||||||
| (text or bytes) as *urlstring* or ``None``, except that the ``''`` is | ||||||
| always allowed, and is automatically converted to ``b''`` if appropriate. | ||||||
|
|
||||||
| If the *allow_fragments* argument is false, fragment identifiers are not | ||||||
| recognized. Instead, they are parsed as part of the path, parameters | ||||||
| or query component, and :attr:`fragment` is set to the empty string in | ||||||
| the return value. | ||||||
| or query component, and :attr:`fragment` is set to ``None`` or the empty | ||||||
| string (depending on the value of *missing_as_none*) in the return value. | ||||||
|
|
||||||
| The return value is a :term:`named tuple`, which means that its items can | ||||||
| be accessed by index or as named attributes, which are: | ||||||
|
|
||||||
| +------------------+-------+-------------------------+------------------------+ | ||||||
| | Attribute | Index | Value | Value if not present | | ||||||
| +==================+=======+=========================+========================+ | ||||||
| | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter | | ||||||
| +------------------+-------+-------------------------+------------------------+ | ||||||
| | :attr:`netloc` | 1 | Network location part | empty string | | ||||||
| +------------------+-------+-------------------------+------------------------+ | ||||||
| | :attr:`path` | 2 | Hierarchical path | empty string | | ||||||
| +------------------+-------+-------------------------+------------------------+ | ||||||
| | :attr:`params` | 3 | Parameters for last | empty string | | ||||||
| | | | path element | | | ||||||
| +------------------+-------+-------------------------+------------------------+ | ||||||
| | :attr:`query` | 4 | Query component | empty string | | ||||||
| +------------------+-------+-------------------------+------------------------+ | ||||||
| | :attr:`fragment` | 5 | Fragment identifier | empty string | | ||||||
| +------------------+-------+-------------------------+------------------------+ | ||||||
| | :attr:`username` | | User name | :const:`None` | | ||||||
| +------------------+-------+-------------------------+------------------------+ | ||||||
| | :attr:`password` | | Password | :const:`None` | | ||||||
| +------------------+-------+-------------------------+------------------------+ | ||||||
| | :attr:`hostname` | | Host name (lower case) | :const:`None` | | ||||||
| +------------------+-------+-------------------------+------------------------+ | ||||||
| | :attr:`port` | | Port number as integer, | :const:`None` | | ||||||
| | | | if present | | | ||||||
| +------------------+-------+-------------------------+------------------------+ | ||||||
| +------------------+-------+-------------------------+-------------------------------+ | ||||||
| | Attribute | Index | Value | Value if not present | | ||||||
| +==================+=======+=========================+===============================+ | ||||||
| | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter or | | ||||||
| | | | | empty string [1]_ | | ||||||
| +------------------+-------+-------------------------+-------------------------------+ | ||||||
| | :attr:`netloc` | 1 | Network location part | ``None`` or empty string [1]_ | | ||||||
| +------------------+-------+-------------------------+-------------------------------+ | ||||||
| | :attr:`path` | 2 | Hierarchical path | empty string | | ||||||
| +------------------+-------+-------------------------+-------------------------------+ | ||||||
| | :attr:`params` | 3 | Parameters for last | ``None`` or empty string [1]_ | | ||||||
| | | | path element | | | ||||||
| +------------------+-------+-------------------------+-------------------------------+ | ||||||
| | :attr:`query` | 4 | Query component | ``None`` or empty string [1]_ | | ||||||
| +------------------+-------+-------------------------+-------------------------------+ | ||||||
| | :attr:`fragment` | 5 | Fragment identifier | ``None`` or empty string [1]_ | | ||||||
| +------------------+-------+-------------------------+-------------------------------+ | ||||||
| | :attr:`username` | | User name | ``None`` | | ||||||
| +------------------+-------+-------------------------+-------------------------------+ | ||||||
| | :attr:`password` | | Password | ``None`` | | ||||||
| +------------------+-------+-------------------------+-------------------------------+ | ||||||
| | :attr:`hostname` | | Host name (lower case) | ``None`` | | ||||||
| +------------------+-------+-------------------------+-------------------------------+ | ||||||
| | :attr:`port` | | Port number as integer, | ``None`` | | ||||||
| | | | if present | | | ||||||
| +------------------+-------+-------------------------+-------------------------------+ | ||||||
|
|
||||||
| .. [1] Depending on the value of the *missing_as_none* argument. | ||||||
| Reading the :attr:`port` attribute will raise a :exc:`ValueError` if | ||||||
| an invalid port is specified in the URL. See section | ||||||
|
|
@@ -187,12 +203,15 @@ | |||||
|
|
||||||
| .. versionchanged:: 3.6 | ||||||
| Out-of-range port numbers now raise :exc:`ValueError`, instead of | ||||||
| returning :const:`None`. | ||||||
| returning ``None``. | ||||||
|
|
||||||
| .. versionchanged:: 3.8 | ||||||
| Characters that affect netloc parsing under NFKC normalization will | ||||||
| now raise :exc:`ValueError`. | ||||||
|
|
||||||
| .. versionchanged:: next | ||||||
| Added the *missing_as_none* parameter. | ||||||
|
|
||||||
|
|
||||||
| .. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&') | ||||||
|
|
||||||
|
|
@@ -287,16 +306,27 @@ | |||||
| separator key, with ``&`` as the default separator. | ||||||
|
|
||||||
|
|
||||||
| .. function:: urlunparse(parts) | ||||||
| .. function:: urlunparse(parts, *, keep_empty=False) | ||||||
|
|
||||||
| Construct a URL from a tuple as returned by ``urlparse()``. The *parts* | ||||||
| argument can be any six-item iterable. This may result in a slightly | ||||||
| different, but equivalent URL, if the URL that was parsed originally had | ||||||
| unnecessary delimiters (for example, a ``?`` with an empty query; the RFC | ||||||
| states that these are equivalent). | ||||||
| argument can be any six-item iterable. | ||||||
|
|
||||||
| This may result in a slightly different, but equivalent URL, if the | ||||||
| URL that was parsed originally had unnecessary delimiters (for example, | ||||||
| a ``?`` with an empty query; the RFC states that these are equivalent). | ||||||
|
|
||||||
| If *keep_empty* is true, empty strings are kept in the result (for example, | ||||||
| a ``?`` for an empty query), only ``None`` components are omitted. | ||||||
| This allows to restore the URL that was parsed with option | ||||||
|
||||||
| This allows to restore the URL that was parsed with option | |
| This allows rebuilding a URL that was parsed with option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is easy to miss this "footnote" about keep_empty not being a simple False default. I think the function signature above should be more clear that it has a non-trivial default value when not explicitly specified. something awkward with a fake descriptive name but indicative that people should read further for details keep_empty=_FALSE_UNLESS_PARTS_IS_A_URLSPLIT_RESULT or similar perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pydoc will output urlunparse(components, *, keep_empty=['not specified']). Is it fine, or you need more descriptive name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above. make it obvious from the function signature that the default depends on the type of parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am really sorry, that I am contributing so late; but "missing_as_none=False" is confusing and not intuitive at all to me.
Pretty sure, others who have not participated are going to feel the same.
The function signature and term is not giving a signal on what it is meant to be.
Are you open to new name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am adding @gpshead, as one of the active developers in this area, to get his opinion too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this name. Admittedly I have spent too much time in the past wrangling problems in this library, but the reason it still works for me despite that is that it is a common concept: do you represent the absence of a value distinctly from the base zero/empty version of that type or not? That is what None is for. and missing_as_none is at least explicit in name to indicate that some values may be None. I'm not going to call it pretty but it is "understandable enough" for me. I can't come up with anything that'd be meaningfully better rather than just alternately-understandable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts about
missing_as_emptywith the opposite semantic? In future, None will be returned for not defined components by default, and you will need to specifymissing_as_none=Falseormissing_as_empty=Trueto restore the current behavior.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that name should really be
missing_as_empty_stringat least (we can’t say just “empty”), meaning missing parts returned as empty strings.What about
use_none=Falsewhich is short and doesn’t try to be self-explanatory, so people need to read the docs?