Skip to content
Open
214 changes: 136 additions & 78 deletions Doc/library/urllib.parse.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,12 +50,16 @@
The URL parsing functions focus on splitting a URL string into its components,
or on combining URL components into a URL string.

.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
.. function:: urlparse(urlstring, scheme=None, allow_fragments=True, *, missing_as_none=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really sorry, that I am contributing so late; but "missing_as_none=False" is confusing and not intuitive at all to me.

Pretty sure, others who have not participated are going to feel the same.

The function signature and term is not giving a signal on what it is meant to be.

Are you open to new name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am adding @gpshead, as one of the active developers in this area, to get his opinion too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this name. Admittedly I have spent too much time in the past wrangling problems in this library, but the reason it still works for me despite that is that it is a common concept: do you represent the absence of a value distinctly from the base zero/empty version of that type or not? That is what None is for. and missing_as_none is at least explicit in name to indicate that some values may be None. I'm not going to call it pretty but it is "understandable enough" for me. I can't come up with anything that'd be meaningfully better rather than just alternately-understandable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts about missing_as_empty with the opposite semantic? In future, None will be returned for not defined components by default, and you will need to specify missing_as_none=False or missing_as_empty=True to restore the current behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that name should really be missing_as_empty_string at least (we can’t say just “empty”), meaning missing parts returned as empty strings.

What about use_none=False which is short and doesn’t try to be self-explanatory, so people need to read the docs?


Parse a URL into six components, returning a 6-item :term:`named tuple`. This
corresponds to the general structure of a URL:
``scheme://netloc/path;parameters?query#fragment``.
Each tuple item is a string, possibly empty. The components are not broken up
Each tuple item is a string, possibly empty, or ``None`` if
*missing_as_none* is true.
Not defined component are represented an empty string (by default) or
``None`` if *missing_as_none* is true.
The components are not broken up
into smaller parts (for example, the network location is a single string), and %
escapes are not expanded. The delimiters as shown above are not part of the
result, except for a leading slash in the *path* component, which is retained if
Expand Down Expand Up @@ -84,6 +88,12 @@
80
>>> o._replace(fragment="").geturl()
'http://docs.python.org:80/3/library/urllib.parse.html?highlight=params'
>>> urlparse("http://docs.python.org?")
ParseResult(scheme='http', netloc='docs.python.org',
path='', params='', query='', fragment='')
>>> urlparse("http://docs.python.org?", missing_as_none=True)
ParseResult(scheme='http', netloc='docs.python.org',
path='', params=None, query='', fragment=None)

Following the syntax specifications in :rfc:`1808`, urlparse recognizes
a netloc only if it is properly introduced by '//'. Otherwise the
Expand All @@ -101,47 +111,53 @@
ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
params='', query='', fragment='')
>>> urlparse('help/Python.html')
ParseResult(scheme='', netloc='', path='help/Python.html', params='',
query='', fragment='')
ParseResult(scheme='', netloc='', path='help/Python.html',
params='', query='', fragment='')
>>> urlparse('help/Python.html', missing_as_none=True)
ParseResult(scheme=None, netloc=None, path='help/Python.html',
params=None, query=None, fragment=None)

The *scheme* argument gives the default addressing scheme, to be
used only if the URL does not specify one. It should be the same type
(text or bytes) as *urlstring*, except that the default value ``''`` is
(text or bytes) as *urlstring* or ``None``, except that the ``''`` is
always allowed, and is automatically converted to ``b''`` if appropriate.

If the *allow_fragments* argument is false, fragment identifiers are not

Check warning on line 125 in Doc/library/urllib.parse.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: fragment [ref.attr]
recognized. Instead, they are parsed as part of the path, parameters
or query component, and :attr:`fragment` is set to the empty string in
the return value.
or query component, and :attr:`fragment` is set to ``None`` or the empty
string (depending on the value of *missing_as_none*) in the return value.

The return value is a :term:`named tuple`, which means that its items can
be accessed by index or as named attributes, which are:

+------------------+-------+-------------------------+------------------------+
| Attribute | Index | Value | Value if not present |
+==================+=======+=========================+========================+
| :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter |
+------------------+-------+-------------------------+------------------------+
| :attr:`netloc` | 1 | Network location part | empty string |
+------------------+-------+-------------------------+------------------------+
| :attr:`path` | 2 | Hierarchical path | empty string |
+------------------+-------+-------------------------+------------------------+
| :attr:`params` | 3 | Parameters for last | empty string |
| | | path element | |
+------------------+-------+-------------------------+------------------------+
| :attr:`query` | 4 | Query component | empty string |
+------------------+-------+-------------------------+------------------------+
| :attr:`fragment` | 5 | Fragment identifier | empty string |
+------------------+-------+-------------------------+------------------------+
| :attr:`username` | | User name | :const:`None` |
+------------------+-------+-------------------------+------------------------+
| :attr:`password` | | Password | :const:`None` |
+------------------+-------+-------------------------+------------------------+
| :attr:`hostname` | | Host name (lower case) | :const:`None` |
+------------------+-------+-------------------------+------------------------+
| :attr:`port` | | Port number as integer, | :const:`None` |
| | | if present | |
+------------------+-------+-------------------------+------------------------+
+------------------+-------+-------------------------+-------------------------------+
| Attribute | Index | Value | Value if not present |
+==================+=======+=========================+===============================+
| :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter or |
| | | | empty string [1]_ |

Check warning on line 137 in Doc/library/urllib.parse.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: scheme [ref.attr]
+------------------+-------+-------------------------+-------------------------------+
| :attr:`netloc` | 1 | Network location part | ``None`` or empty string [1]_ |
+------------------+-------+-------------------------+-------------------------------+

Check warning on line 140 in Doc/library/urllib.parse.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: netloc [ref.attr]
| :attr:`path` | 2 | Hierarchical path | empty string |
+------------------+-------+-------------------------+-------------------------------+

Check warning on line 142 in Doc/library/urllib.parse.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: path [ref.attr]
| :attr:`params` | 3 | Parameters for last | ``None`` or empty string [1]_ |
| | | path element | |

Check warning on line 144 in Doc/library/urllib.parse.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: params [ref.attr]
+------------------+-------+-------------------------+-------------------------------+
| :attr:`query` | 4 | Query component | ``None`` or empty string [1]_ |
+------------------+-------+-------------------------+-------------------------------+

Check warning on line 147 in Doc/library/urllib.parse.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: query [ref.attr]
| :attr:`fragment` | 5 | Fragment identifier | ``None`` or empty string [1]_ |
+------------------+-------+-------------------------+-------------------------------+

Check warning on line 149 in Doc/library/urllib.parse.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: fragment [ref.attr]
| :attr:`username` | | User name | ``None`` |
+------------------+-------+-------------------------+-------------------------------+

Check warning on line 151 in Doc/library/urllib.parse.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: username [ref.attr]
| :attr:`password` | | Password | ``None`` |
+------------------+-------+-------------------------+-------------------------------+

Check warning on line 153 in Doc/library/urllib.parse.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: password [ref.attr]
| :attr:`hostname` | | Host name (lower case) | ``None`` |
+------------------+-------+-------------------------+-------------------------------+

Check warning on line 155 in Doc/library/urllib.parse.rst

View workflow job for this annotation

GitHub Actions / Docs / Docs

py:attr reference target not found: hostname [ref.attr]
| :attr:`port` | | Port number as integer, | ``None`` |
| | | if present | |
+------------------+-------+-------------------------+-------------------------------+

.. [1] Depending on the value of the *missing_as_none* argument.
Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
an invalid port is specified in the URL. See section
Expand Down Expand Up @@ -187,12 +203,15 @@

.. versionchanged:: 3.6
Out-of-range port numbers now raise :exc:`ValueError`, instead of
returning :const:`None`.
returning ``None``.

.. versionchanged:: 3.8
Characters that affect netloc parsing under NFKC normalization will
now raise :exc:`ValueError`.

.. versionchanged:: next
Added the *missing_as_none* parameter.


.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')

Expand Down Expand Up @@ -287,16 +306,27 @@
separator key, with ``&`` as the default separator.


.. function:: urlunparse(parts)
.. function:: urlunparse(parts, *, keep_empty=False)

Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
argument can be any six-item iterable. This may result in a slightly
different, but equivalent URL, if the URL that was parsed originally had
unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
states that these are equivalent).
argument can be any six-item iterable.

This may result in a slightly different, but equivalent URL, if the
URL that was parsed originally had unnecessary delimiters (for example,
a ``?`` with an empty query; the RFC states that these are equivalent).

If *keep_empty* is true, empty strings are kept in the result (for example,
a ``?`` for an empty query), only ``None`` components are omitted.
This allows to restore the URL that was parsed with option
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This allows to restore the URL that was parsed with option
This allows rebuilding a URL that was parsed with option

``missing_as_none=True``.
By default, *keep_empty* is true if *parts* is the result of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is easy to miss this "footnote" about keep_empty not being a simple False default. I think the function signature above should be more clear that it has a non-trivial default value when not explicitly specified. something awkward with a fake descriptive name but indicative that people should read further for details keep_empty=_FALSE_UNLESS_PARTS_IS_A_URLSPLIT_RESULT or similar perhaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pydoc will output urlunparse(components, *, keep_empty=['not specified']). Is it fine, or you need more descriptive name?

:func:`urlparse` call with ``missing_as_none=True``.

.. function:: urlsplit(urlstring, scheme='', allow_fragments=True)
.. versionchanged:: next
Added the *keep_empty* parameter.


.. function:: urlsplit(urlstring, scheme=None, allow_fragments=True, *, missing_as_none=False)

This is similar to :func:`urlparse`, but does not split the params from the URL.
This should generally be used instead of :func:`urlparse` if the more recent URL
Expand All @@ -310,28 +340,31 @@
The return value is a :term:`named tuple`, its items can be accessed by index
or as named attributes:

+------------------+-------+-------------------------+----------------------+
| Attribute | Index | Value | Value if not present |
+==================+=======+=========================+======================+
| :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter |
+------------------+-------+-------------------------+----------------------+
| :attr:`netloc` | 1 | Network location part | empty string |
+------------------+-------+-------------------------+----------------------+
| :attr:`path` | 2 | Hierarchical path | empty string |
+------------------+-------+-------------------------+----------------------+
| :attr:`query` | 3 | Query component | empty string |
+------------------+-------+-------------------------+----------------------+
| :attr:`fragment` | 4 | Fragment identifier | empty string |
+------------------+-------+-------------------------+----------------------+
| :attr:`username` | | User name | :const:`None` |
+------------------+-------+-------------------------+----------------------+
| :attr:`password` | | Password | :const:`None` |
+------------------+-------+-------------------------+----------------------+
| :attr:`hostname` | | Host name (lower case) | :const:`None` |
+------------------+-------+-------------------------+----------------------+
| :attr:`port` | | Port number as integer, | :const:`None` |
| | | if present | |
+------------------+-------+-------------------------+----------------------+
+------------------+-------+-------------------------+-------------------------------+
| Attribute | Index | Value | Value if not present |
+==================+=======+=========================+===============================+
| :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter or |
| | | | empty string [1]_ |
+------------------+-------+-------------------------+-------------------------------+
| :attr:`netloc` | 1 | Network location part | ``None`` or empty string [2]_ |
+------------------+-------+-------------------------+-------------------------------+
| :attr:`path` | 2 | Hierarchical path | empty string |
+------------------+-------+-------------------------+-------------------------------+
| :attr:`query` | 3 | Query component | ``None`` or empty string [2]_ |
+------------------+-------+-------------------------+-------------------------------+
| :attr:`fragment` | 4 | Fragment identifier | ``None`` or empty string [2]_ |
+------------------+-------+-------------------------+-------------------------------+
| :attr:`username` | | User name | ``None`` |
+------------------+-------+-------------------------+-------------------------------+
| :attr:`password` | | Password | ``None`` |
+------------------+-------+-------------------------+-------------------------------+
| :attr:`hostname` | | Host name (lower case) | ``None`` |
+------------------+-------+-------------------------+-------------------------------+
| :attr:`port` | | Port number as integer, | ``None`` |
| | | if present | |
+------------------+-------+-------------------------+-------------------------------+

.. [2] Depending on the value of the *missing_as_none* argument.
Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
an invalid port is specified in the URL. See section
Expand All @@ -356,7 +389,7 @@

.. versionchanged:: 3.6
Out-of-range port numbers now raise :exc:`ValueError`, instead of
returning :const:`None`.
returning ``None``.

.. versionchanged:: 3.8
Characters that affect netloc parsing under NFKC normalization will
Expand All @@ -368,15 +401,30 @@
.. versionchanged:: 3.12
Leading WHATWG C0 control and space characters are stripped from the URL.

.. versionchanged:: next
Added the *missing_as_none* parameter.

.. _WHATWG spec: https://url.spec.whatwg.org/#concept-basic-url-parser

.. function:: urlunsplit(parts)
.. function:: urlunsplit(parts, *, keep_empty=False)

Combine the elements of a tuple as returned by :func:`urlsplit` into a
complete URL as a string. The *parts* argument can be any five-item
iterable. This may result in a slightly different, but equivalent URL, if the
URL that was parsed originally had unnecessary delimiters (for example, a ?
with an empty query; the RFC states that these are equivalent).
iterable.

This may result in a slightly different, but equivalent URL, if the
URL that was parsed originally had unnecessary delimiters (for example,
a ``?`` with an empty query; the RFC states that these are equivalent).

If *keep_empty* is true, empty strings are kept in the result (for example,
a ``?`` for an empty query), only ``None`` components are omitted.
This allows to restore the URL that was parsed with option
``missing_as_none=True``.
By default, *keep_empty* is true if *parts* is the result of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above. make it obvious from the function signature that the default depends on the type of parts.

:func:`urlsplit` call with ``missing_as_none=True``.

.. versionchanged:: next
Added the *keep_empty* parameter.


.. function:: urljoin(base, url, allow_fragments=True)
Expand Down Expand Up @@ -422,30 +470,35 @@
Behavior updated to match the semantics defined in :rfc:`3986`.


.. function:: urldefrag(url)
.. function:: urldefrag(url, *, missing_as_none=False)

If *url* contains a fragment identifier, return a modified version of *url*
with no fragment identifier, and the fragment identifier as a separate
string. If there is no fragment identifier in *url*, return *url* unmodified
and an empty string.
and an empty string (by default) or ``None`` if *missing_as_none* is true.

The return value is a :term:`named tuple`, its items can be accessed by index
or as named attributes:

+------------------+-------+-------------------------+----------------------+
| Attribute | Index | Value | Value if not present |
+==================+=======+=========================+======================+
| :attr:`url` | 0 | URL with no fragment | empty string |
+------------------+-------+-------------------------+----------------------+
| :attr:`fragment` | 1 | Fragment identifier | empty string |
+------------------+-------+-------------------------+----------------------+
+------------------+-------+-------------------------+-------------------------------+
| Attribute | Index | Value | Value if not present |
+==================+=======+=========================+===============================+
| :attr:`url` | 0 | URL with no fragment | empty string |
+------------------+-------+-------------------------+-------------------------------+
| :attr:`fragment` | 1 | Fragment identifier | ``None`` or empty string [3]_ |
+------------------+-------+-------------------------+-------------------------------+

.. [3] Depending on the value of the *missing_as_none* argument.
See section :ref:`urlparse-result-object` for more information on the result
object.

.. versionchanged:: 3.2
Result is a structured object rather than a simple 2-tuple.

.. versionchanged:: next
Added the *missing_as_none* parameter.

.. function:: unwrap(url)

Extract the url from a wrapped URL (that is, a string formatted as
Expand All @@ -465,8 +518,9 @@
purity.

Instead of raising an exception on unusual input, they may instead return some
component parts as empty strings. Or components may contain more than perhaps
they should.
component parts as empty strings or ``None`` (depending on the value of the
*missing_as_none* argument).
Or components may contain more than perhaps they should.

We recommend that users of these APIs where the values may be used anywhere
with security implications code defensively. Do some verification within your
Expand Down Expand Up @@ -542,7 +596,8 @@
Return the re-combined version of the original URL as a string. This may
differ from the original URL in that the scheme may be normalized to lower
case and empty components may be dropped. Specifically, empty parameters,
queries, and fragment identifiers will be removed.
queries, and fragment identifiers will be removed unless the URL was parsed
with ``missing_as_none=True``.

For :func:`urldefrag` results, only empty fragment identifiers will be removed.
For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
Expand All @@ -559,6 +614,9 @@
>>> r2 = urlsplit(r1.geturl())
>>> r2.geturl()
'http://www.Python.org/doc/'
>>> r3 = urlsplit(url, missing_as_none=True)
>>> r3.geturl()
'http://www.Python.org/doc/#'


The following classes provide the implementations of the structured parse
Expand Down
Loading
Loading