Skip to content

Conversation

@serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Aug 25, 2024

Changes in the urllib.parse module:

  • Add option allow_none missing_as_none in urlparse(), urlsplit() and urldefrag(). If it is true, represent not defined components as None instead of an empty string.
  • Add option keep_empty in urlunparse() and urlunsplit(). If it is true, keep empty non-None components in the resulting string. By default it is the same as the allow_none value for the result of the urlparse() and urlsplit() calls.
  • Add option keep_empty in the geturl() method of DefragResult, SplitResult, ParseResult and the corresponding bytes counterparts.

…I components

Changes in the urllib.parse module:

* Add option allow_none in urlparse(), urlsplit() and urldefrag(). If
  it is true, represent not defined components as None instead of an
  empty string.
* Add option keep_empty in urlunparse() and urlunsplit(). If it is
  true, keep empty non-None components in the resulting string.
* Add option keep_empty in the geturl() method of DefragResult,
  SplitResult, ParseResult and the corresponding bytes counterparts.
@serhiy-storchaka serhiy-storchaka force-pushed the urllib-parse-allow-none branch from 7032015 to a60c9be Compare August 31, 2024 09:55
@serhiy-storchaka serhiy-storchaka marked this pull request as ready for review November 27, 2024 11:16
@serhiy-storchaka
Copy link
Member Author

It is now ready to review. The status of allow_none is now saved in the DefragResult, SplitResult and ParseResult objects, so in most cases there is no need to pass the keep_empty argument. geturl() no longer needs the keep_empty parameter.

Unfortunately, these objects now have __dict__ and no longer immutable. This is because non-empty __slots__ is not compatible with tuple subclasses. This is a separate complex issue. I'll try to find a solution of it, but it may be difficult.

The long term plan is to make allow_none True by default, and later deprecate allow_none=False. keep_empty=False can still be useful.

@serhiy-storchaka serhiy-storchaka marked this pull request as draft November 28, 2024 09:14
@serhiy-storchaka serhiy-storchaka marked this pull request as ready for review December 5, 2024 11:10
@serhiy-storchaka
Copy link
Member Author

I am sorry, I forget to copy the _keep_empty attribute in copying/encoding/decoding methods. Now the PR is ready for review.

@orsenthil, @barneygale, could you please make a review?

@barneygale barneygale self-requested a review December 5, 2024 14:03
@merwok
Copy link
Member

merwok commented Nov 17, 2025

I think that allow_none is not the correct name: this is not about allowing nones in the input (like how allow_fragments allows fragment in the input), but about controlling the data returned.

Possible alternatives:

  • use_none (short)
  • missing_as_none
  • use_none_for_empty

@serhiy-storchaka
Copy link
Member Author

Maybe missing_as_empty? In future the behavior will be changed to use None for missing by default.

@merwok
Copy link
Member

merwok commented Nov 17, 2025

I think it should be missing_as_empty_string (short for return missing values as empty strings), which is long!

To change the default in the future, do you plan on adding a warning first?
I could see many projects not paying attention to the new param and getting surprised (or silent bugs) a few years after.

@serhiy-storchaka
Copy link
Member Author

To change the default in the future, do you plan on adding a warning first?

On one hand, a warning will inform everyone about the change (it should be a FutureWarning). You will have to pass explicit True or False to silence it and get your behavior. This how we normally do. On other hand, the warning will unnecessary disturb those who are fine with any behavior. We will discuss this when the time come.

@serhiy-storchaka
Copy link
Member Author

I think it should be missing_as_empty_string (short for return missing values as empty strings), which is long!

Currently, missing and empty component are not distinguishable. This parameter will allow to distinguish them. This PR adds also the keeps_empty parameter. So, "empty" refers not only to string, but to component. missing_component_as_empty_component, or simply missing_as_empty. Will it work?

@merwok
Copy link
Member

merwok commented Nov 17, 2025

Ah your’re right, there is empty component and empty string.

missing_as_none is good!


If *keep_empty* is true, empty strings are kept in the result (for example,
a ``?`` for an empty query), only ``None`` components are omitted.
This allows to restore the URL that was parsed with option
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This allows to restore the URL that was parsed with option
This allows rebuilding a URL that was parsed with option

or on combining URL components into a URL string.

.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
.. function:: urlparse(urlstring, scheme=None, allow_fragments=True, *, missing_as_none=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really sorry, that I am contributing so late; but "missing_as_none=False" is confusing and not intuitive at all to me.

Pretty sure, others who have not participated are going to feel the same.

The function signature and term is not giving a signal on what it is meant to be.

Are you open to new name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am adding @gpshead, as one of the active developers in this area, to get his opinion too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this name. Admittedly I have spent too much time in the past wrangling problems in this library, but the reason it still works for me despite that is that it is a common concept: do you represent the absence of a value distinctly from the base zero/empty version of that type or not? That is what None is for. and missing_as_none is at least explicit in name to indicate that some values may be None. I'm not going to call it pretty but it is "understandable enough" for me. I can't come up with anything that'd be meaningfully better rather than just alternately-understandable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts about missing_as_empty with the opposite semantic? In future, None will be returned for not defined components by default, and you will need to specify missing_as_none=False or missing_as_empty=True to restore the current behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that name should really be missing_as_empty_string at least (we can’t say just “empty”), meaning missing parts returned as empty strings.

What about use_none=False which is short and doesn’t try to be self-explanatory, so people need to read the docs?

@bedevere-app
Copy link

bedevere-app bot commented Nov 19, 2025

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

Copy link
Member

@gpshead gpshead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall I like this, comments are all about how to explain it in the docs.

or on combining URL components into a URL string.

.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
.. function:: urlparse(urlstring, scheme=None, allow_fragments=True, *, missing_as_none=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this name. Admittedly I have spent too much time in the past wrangling problems in this library, but the reason it still works for me despite that is that it is a common concept: do you represent the absence of a value distinctly from the base zero/empty version of that type or not? That is what None is for. and missing_as_none is at least explicit in name to indicate that some values may be None. I'm not going to call it pretty but it is "understandable enough" for me. I can't come up with anything that'd be meaningfully better rather than just alternately-understandable.

a ``?`` for an empty query), only ``None`` components are omitted.
This allows to restore the URL that was parsed with option
``missing_as_none=True``.
By default, *keep_empty* is true if *parts* is the result of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is easy to miss this "footnote" about keep_empty not being a simple False default. I think the function signature above should be more clear that it has a non-trivial default value when not explicitly specified. something awkward with a fake descriptive name but indicative that people should read further for details keep_empty=_FALSE_UNLESS_PARTS_IS_A_URLSPLIT_RESULT or similar perhaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pydoc will output urlunparse(components, *, keep_empty=['not specified']). Is it fine, or you need more descriptive name?

a ``?`` for an empty query), only ``None`` components are omitted.
This allows to restore the URL that was parsed with option
``missing_as_none=True``.
By default, *keep_empty* is true if *parts* is the result of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above. make it obvious from the function signature that the default depends on the type of parts.

@gpshead gpshead requested a review from sethmlarson November 19, 2025 06:02
Copy link
Member Author

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your attention.

I would like to take this opportunity to discuss a certain design issue. To make this change more user friendly, we need to keep the status of missing_as_none in the result of urlsplit() and others. So the user will not need to specify keep_empty=True each time they use missing_as_none=True, and change their code again in future. There are two ways to do this:

  1. Return different subclasses of SplitResult (and of other 5 classes). This approach already was used to differentiate string/bytes results. This will add 12 new classes.
  2. Add a (hidden) boolean attribute to instances of SplitResult (and of other 5 classes).

In both cases there are some issues with copying, pickling, etc, which will be resolved in different ways. In distant future, after flipping the default behavior and deprecating and removing the current behavior (if we completely remove it), we may return to the current state of 6 named tuple classes.

What approach do you prefer?

or on combining URL components into a URL string.

.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
.. function:: urlparse(urlstring, scheme=None, allow_fragments=True, *, missing_as_none=False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts about missing_as_empty with the opposite semantic? In future, None will be returned for not defined components by default, and you will need to specify missing_as_none=False or missing_as_empty=True to restore the current behavior.

a ``?`` for an empty query), only ``None`` components are omitted.
This allows to restore the URL that was parsed with option
``missing_as_none=True``.
By default, *keep_empty* is true if *parts* is the result of the
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pydoc will output urlunparse(components, *, keep_empty=['not specified']). Is it fine, or you need more descriptive name?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants