Skip to content

Conversation

@Davda-James
Copy link

@Davda-James Davda-James commented Aug 31, 2025

urllib.parse.parse_qsl earlier it was accepting the illegal characters as well.

Proof (that I reproduce) :
Before_Fix

Closes issue : #138284

Proof (after fixing error):
After_fix

I added the test for it as well.
Test for urlparse only :
test

All tests:
all_tests

  • Passes all tests

@python-cla-bot
Copy link

python-cla-bot bot commented Aug 31, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@bedevere-app
Copy link

bedevere-app bot commented Aug 31, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@Davda-James Davda-James changed the title urllib.parse.parse_qsl now raises ValueError if illegal characters is passed, according to RFC 3986 gh-138284 : urllib.parse.parse_qsl now raises ValueError if illegal characters is passed, according to RFC 3986 Aug 31, 2025
@bedevere-app
Copy link

bedevere-app bot commented Aug 31, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

_UNSAFE_URL_BYTES_TO_REMOVE = ['\t', '\r', '\n']

# Allowed valid characters in parse_qsl
_VALID_QUERY_CHARS = re.compile(r"^[A-Za-z0-9\-._~!$&'()*+,;=:@/?%]*$")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be replaced with str.isascii, str.isdecimal and a strings with the others, this should be faster.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I will do it and add new commit.

@StanFromIreland
Copy link
Member

Please add a NEWS entry, and this does break existing code.

@bedevere-app
Copy link

bedevere-app bot commented Aug 31, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@Davda-James
Copy link
Author

@StanFromIreland I have added your suggestion. Can you please review it again.
Thank You

Copy link

@shloktech shloktech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am the reporter of the issue which is being solved here. The changes look good to me, and I think they solve the issue very well. Approving from my end.

Note: @Davda-James do get it reviewed by Stan.

Suggestion: Squash your commits below to have a single commit. It is a good practice to have :)

Image

cc: @StanFromIreland

@StanFromIreland
Copy link
Member

I have requested the expert for this module.

Suggestion: Squash your commits below to have a single commit.

Please do not, it confuses gh making it difficult to review. They will be squashed when merged anyway.

@picnixz picnixz self-requested a review August 31, 2025 17:14
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that this change is expected, the following must be done:

  • The documentation of urllib.parse.parse_qsl must be updated accordingly.
  • Test coverage must be increased.

I'm not entirely sure that we necessarily need to consider this as a bug fix. The rationale is as follows:

The urlsplit() and urlparse() APIs do not perform validation of inputs. They may not raise errors on inputs that other applications consider invalid. They may also succeed on some inputs that might not be considered URLs elsewhere. Their purpose is for practical functionality rather than purity.

I do not know whether we should consider this is a pitfall or not.

raise ValueError("bad query field: %r" % (name_value,))
if strict_parsing:
# Validate RFC3986 characters
to_check = (name_value.decode() if isinstance(name_value, bytes) else name_value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use _unquote as this handles the %-encoded values and takes care of the encoding parameter as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if strict_parsing:
# Validate RFC3986 characters
to_check = _unquote(name_value)
if isinstance(to_check, (bytes, bytearray)):
to_check = to_check.decode(encoding, errors)
if not _is_valid_rfc3986_query(to_check): using like this is it good as we need to decode back as _unquote returns bytes and _is_valid_rfc3986_query accepts the string ?

Davda-James and others added 2 commits August 31, 2025 23:43
@shloktech
Copy link

Hello @Davda-James any updates on the suggestion provided?
If you are occupied then can I take your changes further?

@Davda-James
Copy link
Author

@shloktech have changed according to suggestions but it was left upto @orsenthil if we needed this or not .. if acknowledgement comes from him, will then update docs and test coverage

@shloktech
Copy link

Hello @orsenthil can you please help with review and approval?
Feel free to let us know if you have any questions.

CC: @StanFromIreland / @picnixz / @Davda-James

@shloktech
Copy link

Hello @orsenthil,

Gentle Reminder 3:
Could you please review the comments shared above and help us with your feedback and approval for the proposed changes related to #138284?

This request has been pending review on your end for nearly 2 months, so it would be great if we could expedite the process.
Please let us know if you need any additional information or clarification.

CC: @StanFromIreland / @picnixz / @Davda-James
Tagging top contributors: @gvanrossum / @vstinner / @benjaminp

Thank you for your time and support!

@picnixz
Copy link
Member

picnixz commented Oct 26, 2025

Please avoid tagging unrelated users. Usually, we ask for experts to chime in and we have a huge PR backlog.

This request has been pending review on your end for nearly 2 months, so it would be great if we could expedite the process.

Some PRs sit for even longer and the words "expedite the process" won't make the process faster. I would also suggest not to use LLM-generated answers. Finally, as it was said, this PR is still not in a mergeable state as docs and tests are lacking (on purpose, so it's fine for now).


@serhiy-storchaka What do you think about this one? I personally don't consider this as a bug because we explicitly say:

The urlsplit() and urlparse() APIs do not perform validation of inputs. They may not raise errors on inputs that other applications consider invalid. They may also succeed on some inputs that might not be considered URLs elsewhere. Their purpose is for practical functionality rather than purity.

I'm inclined to say "well, why not" since this check is only performed if strict is true but OTOH, it could open a can of worms where users would want more and more checks.

@shloktech
Copy link

Hello @picnixz,
Thank you for the updates and feedback.

Please avoid tagging unrelated users. Usually, we ask for experts to chime in and we have a huge PR backlog.
Some PRs sit for even longer and the words "expedite the process" won't make the process faster.

Sure, I will ensure to tag relevant users. I agree I have got this bad habit of tagging top contributors of every repo to which I contribute to it is a simple hack which I found to get things done quickly :) Will ensure to not do it for this repo.

Finally, as it was said, this PR is still not in a mergeable state as docs and tests are lacking (on purpose, so it's fine for now).
I appreciate the clarification about its current non-mergeable state due to missing docs/tests ++ @Davda-James for it.

As for the urlsplit() behavior, I agree with your point. let's wait for @serhiy-storchaka reply.

I would also suggest not to use LLM-generated answers.

Yes, the previous message was enhanced using AI have ensured to not do it for this one :)

Happy Sunday!!

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment on the issue.

As for this concrete code, I have two comments:

  • Valid characters should be checked before calling _unquote(). %5e is valid even if ^ is not.
  • The code is not particularly efficient. Perhaps using regular expressions will be more efficient. You can also check the whole input instead of separate fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants