python · serhiy-storchaka · Aug 22, 2024 · Aug 31, 2024 · Nov 26, 2024 · Nov 27, 2024
diff --git a/Doc/library/urllib.parse.rst b/Doc/library/urllib.parse.rst
@@ -50,12 +50,16 @@
 The URL parsing functions focus on splitting a URL string into its components,
 or on combining URL components into a URL string.
 
-.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
+.. function:: urlparse(urlstring, scheme=None, allow_fragments=True, *, missing_as_none=False)
 
    Parse a URL into six components, returning a 6-item :term:`named tuple`.  This
    corresponds to the general structure of a URL:
    ``scheme://netloc/path;parameters?query#fragment``.
-   Each tuple item is a string, possibly empty. The components are not broken up
+   Each tuple item is a string, possibly empty, or ``None`` if
+   *missing_as_none* is true.
+   Not defined component are represented an empty string (by default) or
+   ``None`` if *missing_as_none* is true.
+   The components are not broken up
    into smaller parts (for example, the network location is a single string), and %
    escapes are not expanded. The delimiters as shown above are not part of the
    result, except for a leading slash in the *path* component, which is retained if
@@ -84,6 +88,12 @@
       80
       >>> o._replace(fragment="").geturl()
       'http://docs.python.org:80/3/library/urllib.parse.html?highlight=params'
+      >>> urlparse("http://docs.python.org?")
+      ParseResult(scheme='http', netloc='docs.python.org',
+                  path='', params='', query='', fragment='')
+      >>> urlparse("http://docs.python.org?", missing_as_none=True)
+      ParseResult(scheme='http', netloc='docs.python.org',
+                  path='', params=None, query='', fragment=None)
 
    Following the syntax specifications in :rfc:`1808`, urlparse recognizes
    a netloc only if it is properly introduced by '//'.  Otherwise the
@@ -101,47 +111,53 @@
       ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
                   params='', query='', fragment='')
       >>> urlparse('help/Python.html')
-      ParseResult(scheme='', netloc='', path='help/Python.html', params='',
-                  query='', fragment='')
+      ParseResult(scheme='', netloc='', path='help/Python.html',
+                  params='', query='', fragment='')
+      >>> urlparse('help/Python.html', missing_as_none=True)
+      ParseResult(scheme=None, netloc=None, path='help/Python.html',
+                  params=None, query=None, fragment=None)
 
    The *scheme* argument gives the default addressing scheme, to be
    used only if the URL does not specify one.  It should be the same type
-   (text or bytes) as *urlstring*, except that the default value ``''`` is
+   (text or bytes) as *urlstring* or ``None``, except that the ``''`` is
    always allowed, and is automatically converted to ``b''`` if appropriate.
 
    If the *allow_fragments* argument is false, fragment identifiers are not
    recognized.  Instead, they are parsed as part of the path, parameters
-   or query component, and :attr:`fragment` is set to the empty string in
-   the return value.
+   or query component, and :attr:`fragment` is set to ``None`` or the empty
+   string (depending on the value of *missing_as_none*) in the return value.
 
    The return value is a :term:`named tuple`, which means that its items can
    be accessed by index or as named attributes, which are:
 
-   +------------------+-------+-------------------------+------------------------+
-   | Attribute        | Index | Value                   | Value if not present   |
-   +==================+=======+=========================+========================+
-   | :attr:`scheme`   | 0     | URL scheme specifier    | *scheme* parameter     |
-   +------------------+-------+-------------------------+------------------------+
-   | :attr:`netloc`   | 1     | Network location part   | empty string           |
-   +------------------+-------+-------------------------+------------------------+
-   | :attr:`path`     | 2     | Hierarchical path       | empty string           |
-   +------------------+-------+-------------------------+------------------------+
-   | :attr:`params`   | 3     | Parameters for last     | empty string           |
-   |                  |       | path element            |                        |
-   +------------------+-------+-------------------------+------------------------+
-   | :attr:`query`    | 4     | Query component         | empty string           |
-   +------------------+-------+-------------------------+------------------------+
-   | :attr:`fragment` | 5     | Fragment identifier     | empty string           |
-   +------------------+-------+-------------------------+------------------------+
-   | :attr:`username` |       | User name               | :const:`None`          |
-   +------------------+-------+-------------------------+------------------------+
-   | :attr:`password` |       | Password                | :const:`None`          |
-   +------------------+-------+-------------------------+------------------------+
-   | :attr:`hostname` |       | Host name (lower case)  | :const:`None`          |
-   +------------------+-------+-------------------------+------------------------+
-   | :attr:`port`     |       | Port number as integer, | :const:`None`          |
-   |                  |       | if present              |                        |
-   +------------------+-------+-------------------------+------------------------+
+   +------------------+-------+-------------------------+-------------------------------+
+   | Attribute        | Index | Value                   | Value if not present          |
+   +==================+=======+=========================+===============================+
+   | :attr:`scheme`   | 0     | URL scheme specifier    | *scheme* parameter or         |
+   |                  |       |                         | empty string [1]_             |
+   +------------------+-------+-------------------------+-------------------------------+
+   | :attr:`netloc`   | 1     | Network location part   | ``None`` or empty string [1]_ |
+   +------------------+-------+-------------------------+-------------------------------+
+   | :attr:`path`     | 2     | Hierarchical path       | empty string                  |
+   +------------------+-------+-------------------------+-------------------------------+
+   | :attr:`params`   | 3     | Parameters for last     | ``None`` or empty string [1]_ |
+   |                  |       | path element            |                               |
+   +------------------+-------+-------------------------+-------------------------------+
+   | :attr:`query`    | 4     | Query component         | ``None`` or empty string [1]_ |
+   +------------------+-------+-------------------------+-------------------------------+
+   | :attr:`fragment` | 5     | Fragment identifier     | ``None`` or empty string [1]_ |
+   +------------------+-------+-------------------------+-------------------------------+
+   | :attr:`username` |       | User name               | ``None``                      |
+   +------------------+-------+-------------------------+-------------------------------+
+   | :attr:`password` |       | Password                | ``None``                      |
+   +------------------+-------+-------------------------+-------------------------------+
+   | :attr:`hostname` |       | Host name (lower case)  | ``None``                      |
+   +------------------+-------+-------------------------+-------------------------------+
+   | :attr:`port`     |       | Port number as integer, | ``None``                      |
+   |                  |       | if present              |                               |
+   +------------------+-------+-------------------------+-------------------------------+
+
+   .. [1] Depending on the value of the *missing_as_none* argument.
 
    Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
    an invalid port is specified in the URL.  See section
@@ -187,12 +203,15 @@
 
    .. versionchanged:: 3.6
       Out-of-range port numbers now raise :exc:`ValueError`, instead of
-      returning :const:`None`.
+      returning ``None``.
 
    .. versionchanged:: 3.8
       Characters that affect netloc parsing under NFKC normalization will
       now raise :exc:`ValueError`.
 
+   .. versionchanged:: next
+      Added the *missing_as_none* parameter.
+
 
 .. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')
 
@@ -287,16 +306,27 @@
       separator key, with ``&`` as the default separator.
 
 
-.. function:: urlunparse(parts)
+.. function:: urlunparse(parts, *, keep_empty=False)
 
    Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
-   argument can be any six-item iterable. This may result in a slightly
-   different, but equivalent URL, if the URL that was parsed originally had
-   unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
-   states that these are equivalent).
+   argument can be any six-item iterable.
+
+   This may result in a slightly different, but equivalent URL, if the
+   URL that was parsed originally had unnecessary delimiters (for example,
+   a ``?`` with an empty query; the RFC states that these are equivalent).
 
+   If *keep_empty* is true, empty strings are kept in the result (for example,
+   a ``?`` for an empty query), only ``None`` components are omitted.
+   This allows to restore the URL that was parsed with option
-   This allows to restore the URL that was parsed with option
+   This allows rebuilding a URL that was parsed with option
-   This allows to restore the URL that was parsed with option
+   This allows rebuilding a URL that was parsed with option
+   ``missing_as_none=True``.
+   By default, *keep_empty* is true if *parts* is the result of the
+   :func:`urlparse` call with ``missing_as_none=True``.
 
-.. function:: urlsplit(urlstring, scheme='', allow_fragments=True)
+   .. versionchanged:: next
+      Added the *keep_empty* parameter.
+
+
+.. function:: urlsplit(urlstring, scheme=None, allow_fragments=True, *, missing_as_none=False)
 
    This is similar to :func:`urlparse`, but does not split the params from the URL.
    This should generally be used instead of :func:`urlparse` if the more recent URL
@@ -310,28 +340,31 @@
    The return value is a :term:`named tuple`, its items can be accessed by index
    or as named attributes:
 
-   +------------------+-------+-------------------------+----------------------+
-   | Attribute        | Index | Value                   | Value if not present |
-   +==================+=======+=========================+======================+
-   | :attr:`scheme`   | 0     | URL scheme specifier    | *scheme* parameter   |
-   +------------------+-------+-------------------------+----------------------+
-   | :attr:`netloc`   | 1     | Network location part   | empty string         |
-   +------------------+-------+-------------------------+----------------------+
-   | :attr:`path`     | 2     | Hierarchical path       | empty string         |
-   +------------------+-------+-------------------------+----------------------+
-   | :attr:`query`    | 3     | Query component         | empty string         |
-   +------------------+-------+-------------------------+----------------------+
-   | :attr:`fragment` | 4     | Fragment identifier     | empty string         |
-   +------------------+-------+-------------------------+----------------------+
-   | :attr:`username` |       | User name               | :const:`None`        |
-   +------------------+-------+-------------------------+----------------------+
-   | :attr:`password` |       | Password                | :const:`None`        |
-   +------------------+-------+-------------------------+----------------------+
-   | :attr:`hostname` |       | Host name (lower case)  | :const:`None`        |
-   +------------------+-------+-------------------------+----------------------+
-   | :attr:`port`     |       | Port number as integer, | :const:`None`        |
-   |                  |       | if present              |                      |
-   +------------------+-------+-------------------------+----------------------+
+   +------------------+-------+-------------------------+-------------------------------+
+   | Attribute        | Index | Value                   | Value if not present          |
+   +==================+=======+=========================+===============================+
+   | :attr:`scheme`   | 0     | URL scheme specifier    | *scheme* parameter or         |
+   |                  |       |                         | empty string [1]_             |
+   +------------------+-------+-------------------------+-------------------------------+
+   | :attr:`netloc`   | 1     | Network location part   | ``None`` or empty string [2]_ |
+   +------------------+-------+-------------------------+-------------------------------+
+   | :attr:`path`     | 2     | Hierarchical path       | empty string                  |
+   +------------------+-------+-------------------------+-------------------------------+
+   | :attr:`query`    | 3     | Query component         | ``None`` or empty string [2]_ |
+   +------------------+-------+-------------------------+-------------------------------+
+   | :attr:`fragment` | 4     | Fragment identifier     | ``None`` or empty string [2]_ |
+   +------------------+-------+-------------------------+-------------------------------+
+   | :attr:`username` |       | User name               | ``None``                      |
+   +------------------+-------+-------------------------+-------------------------------+
+   | :attr:`password` |       | Password                | ``None``                      |
+   +------------------+-------+-------------------------+-------------------------------+
+   | :attr:`hostname` |       | Host name (lower case)  | ``None``                      |
+   +------------------+-------+-------------------------+-------------------------------+
+   | :attr:`port`     |       | Port number as integer, | ``None``                      |
+   |                  |       | if present              |                               |
+   +------------------+-------+-------------------------+-------------------------------+
+
+   .. [2] Depending on the value of the *missing_as_none* argument.
 
    Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
    an invalid port is specified in the URL.  See section
@@ -356,7 +389,7 @@
 
    .. versionchanged:: 3.6
       Out-of-range port numbers now raise :exc:`ValueError`, instead of
-      returning :const:`None`.
+      returning ``None``.
 
    .. versionchanged:: 3.8
       Characters that affect netloc parsing under NFKC normalization will
@@ -368,15 +401,30 @@
    .. versionchanged:: 3.12
       Leading WHATWG C0 control and space characters are stripped from the URL.
 
+   .. versionchanged:: next
+      Added the *missing_as_none* parameter.
+
 .. _WHATWG spec: https://url.spec.whatwg.org/#concept-basic-url-parser
 
-.. function:: urlunsplit(parts)
+.. function:: urlunsplit(parts, *, keep_empty=False)
 
    Combine the elements of a tuple as returned by :func:`urlsplit` into a
    complete URL as a string. The *parts* argument can be any five-item
-   iterable. This may result in a slightly different, but equivalent URL, if the
-   URL that was parsed originally had unnecessary delimiters (for example, a ?
-   with an empty query; the RFC states that these are equivalent).
+   iterable.
+
+   This may result in a slightly different, but equivalent URL, if the
+   URL that was parsed originally had unnecessary delimiters (for example,
+   a ``?`` with an empty query; the RFC states that these are equivalent).
+
+   If *keep_empty* is true, empty strings are kept in the result (for example,
+   a ``?`` for an empty query), only ``None`` components are omitted.
+   This allows to restore the URL that was parsed with option
+   ``missing_as_none=True``.
+   By default, *keep_empty* is true if *parts* is the result of the
+   :func:`urlsplit` call with ``missing_as_none=True``.
+
+   .. versionchanged:: next
+      Added the *keep_empty* parameter.
 
 
 .. function:: urljoin(base, url, allow_fragments=True)
@@ -422,30 +470,35 @@
       Behavior updated to match the semantics defined in :rfc:`3986`.
 
 
-.. function:: urldefrag(url)
+.. function:: urldefrag(url, *, missing_as_none=False)
 
    If *url* contains a fragment identifier, return a modified version of *url*
    with no fragment identifier, and the fragment identifier as a separate
    string.  If there is no fragment identifier in *url*, return *url* unmodified
-   and an empty string.
+   and an empty string (by default) or ``None`` if *missing_as_none* is true.
 
    The return value is a :term:`named tuple`, its items can be accessed by index
    or as named attributes:
 
-   +------------------+-------+-------------------------+----------------------+
-   | Attribute        | Index | Value                   | Value if not present |
-   +==================+=======+=========================+======================+
-   | :attr:`url`      | 0     | URL with no fragment    | empty string         |
-   +------------------+-------+-------------------------+----------------------+
-   | :attr:`fragment` | 1     | Fragment identifier     | empty string         |
-   +------------------+-------+-------------------------+----------------------+
+   +------------------+-------+-------------------------+-------------------------------+
+   | Attribute        | Index | Value                   | Value if not present          |
+   +==================+=======+=========================+===============================+
+   | :attr:`url`      | 0     | URL with no fragment    | empty string                  |
+   +------------------+-------+-------------------------+-------------------------------+
+   | :attr:`fragment` | 1     | Fragment identifier     | ``None`` or empty string [3]_ |
+   +------------------+-------+-------------------------+-------------------------------+
+
+   .. [3] Depending on the value of the *missing_as_none* argument.
 
    See section :ref:`urlparse-result-object` for more information on the result
    object.
 
    .. versionchanged:: 3.2
       Result is a structured object rather than a simple 2-tuple.
 
+   .. versionchanged:: next
+      Added the *missing_as_none* parameter.
+
 .. function:: unwrap(url)
 
    Extract the url from a wrapped URL (that is, a string formatted as
@@ -465,8 +518,9 @@
 purity.
 
 Instead of raising an exception on unusual input, they may instead return some
-component parts as empty strings. Or components may contain more than perhaps
-they should.
+component parts as empty strings or ``None`` (depending on the value of the
+*missing_as_none* argument).
+Or components may contain more than perhaps they should.
 
 We recommend that users of these APIs where the values may be used anywhere
 with security implications code defensively. Do some verification within your
@@ -542,7 +596,8 @@
    Return the re-combined version of the original URL as a string. This may
    differ from the original URL in that the scheme may be normalized to lower
    case and empty components may be dropped. Specifically, empty parameters,
-   queries, and fragment identifiers will be removed.
+   queries, and fragment identifiers will be removed unless the URL was parsed
+   with ``missing_as_none=True``.
 
    For :func:`urldefrag` results, only empty fragment identifiers will be removed.
    For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
@@ -559,6 +614,9 @@
       >>> r2 = urlsplit(r1.geturl())
       >>> r2.geturl()
       'http://www.Python.org/doc/'
+      >>> r3 = urlsplit(url, missing_as_none=True)
+      >>> r3.geturl()
+      'http://www.Python.org/doc/#'
 
 
 The following classes provide the implementations of the structured parse