gh-138284 : urllib.parse.parse_qsl now raises ValueError if illegal characters is passed, according to RFC 3986 #138291

Davda-James · 2025-08-31T12:30:55Z

urllib.parse.parse_qsl earlier it was accepting the illegal characters as well.

Proof (that I reproduce) :

Closes issue : #138284

Proof (after fixing error):

I added the test for it as well.
Test for urlparse only :

All tests:

Passes all tests

Issue: urllib.parse.parse_qsl is accepting illegal characters #138284

… passed

python-cla-bot · 2025-08-31T12:30:59Z

All commit authors signed the Contributor License Agreement.

bedevere-app · 2025-08-31T12:31:00Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

bedevere-app · 2025-08-31T12:44:27Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

StanFromIreland · 2025-08-31T12:44:34Z

Lib/urllib/parse.py

 _UNSAFE_URL_BYTES_TO_REMOVE = ['\t', '\r', '\n']

+# Allowed valid characters in parse_qsl
+_VALID_QUERY_CHARS = re.compile(r"^[A-Za-z0-9\-._~!$&'()*+,;=:@/?%]*$") 


This could be replaced with str.isascii, str.isdecimal and a strings with the others, this should be faster.

Okay I will do it and add new commit.

StanFromIreland · 2025-08-31T12:45:13Z

Please add a NEWS entry, and this does break existing code.

…k for performance

bedevere-app · 2025-08-31T12:55:39Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Davda-James · 2025-08-31T13:41:02Z

@StanFromIreland I have added your suggestion. Can you please review it again.
Thank You

shloktech

I am the reporter of the issue which is being solved here. The changes look good to me, and I think they solve the issue very well. Approving from my end.

Note: @Davda-James do get it reviewed by Stan.

Suggestion: Squash your commits below to have a single commit. It is a good practice to have :)

cc: @StanFromIreland

StanFromIreland · 2025-08-31T16:57:47Z

I have requested the expert for this module.

Suggestion: Squash your commits below to have a single commit.

Please do not, it confuses gh making it difficult to review. They will be squashed when merged anyway.

picnixz

Assuming that this change is expected, the following must be done:

The documentation of urllib.parse.parse_qsl must be updated accordingly.
Test coverage must be increased.

I'm not entirely sure that we necessarily need to consider this as a bug fix. The rationale is as follows:

The urlsplit() and urlparse() APIs do not perform validation of inputs. They may not raise errors on inputs that other applications consider invalid. They may also succeed on some inputs that might not be considered URLs elsewhere. Their purpose is for practical functionality rather than purity.

I do not know whether we should consider this is a pitfall or not.

Misc/NEWS.d/next/Library/2025-08-31-13-00-22.gh-issue-138284.6MOp4k.rst

Lib/test/test_urlparse.py

Lib/urllib/parse.py

picnixz · 2025-08-31T17:39:18Z

Lib/urllib/parse.py

                raise ValueError("bad query field: %r" % (name_value,))
+            if strict_parsing:
+                # Validate RFC3986 characters
+                to_check = (name_value.decode() if isinstance(name_value, bytes) else name_value)


Use _unquote as this handles the %-encoded values and takes care of the encoding parameter as well.

if strict_parsing:
# Validate RFC3986 characters
to_check = _unquote(name_value)
if isinstance(to_check, (bytes, bytearray)):
to_check = to_check.decode(encoding, errors)
if not _is_valid_rfc3986_query(to_check): using like this is it good as we need to decode back as _unquote returns bytes and _is_valid_rfc3986_query accepts the string ?

…MOp4k.rst Updated it according to suggestion Co-authored-by: Bénédikt Tran <[email protected]>

shloktech · 2025-10-01T09:35:45Z

Hello @Davda-James any updates on the suggestion provided?
If you are occupied then can I take your changes further?

Davda-James · 2025-10-01T10:22:13Z

@shloktech have changed according to suggestions but it was left upto @orsenthil if we needed this or not .. if acknowledgement comes from him, will then update docs and test coverage

shloktech · 2025-10-03T15:41:55Z

Hello @orsenthil can you please help with review and approval?
Feel free to let us know if you have any questions.

CC: @StanFromIreland / @picnixz / @Davda-James

shloktech · 2025-10-26T09:58:58Z

Hello @orsenthil,

Gentle Reminder 3:
Could you please review the comments shared above and help us with your feedback and approval for the proposed changes related to #138284?

This request has been pending review on your end for nearly 2 months, so it would be great if we could expedite the process.
Please let us know if you need any additional information or clarification.

CC: @StanFromIreland / @picnixz / @Davda-James
Tagging top contributors: @gvanrossum / @vstinner / @benjaminp

Thank you for your time and support!

picnixz · 2025-10-26T10:09:19Z

Please avoid tagging unrelated users. Usually, we ask for experts to chime in and we have a huge PR backlog.

This request has been pending review on your end for nearly 2 months, so it would be great if we could expedite the process.

Some PRs sit for even longer and the words "expedite the process" won't make the process faster. I would also suggest not to use LLM-generated answers. Finally, as it was said, this PR is still not in a mergeable state as docs and tests are lacking (on purpose, so it's fine for now).

@serhiy-storchaka What do you think about this one? I personally don't consider this as a bug because we explicitly say:

The urlsplit() and urlparse() APIs do not perform validation of inputs. They may not raise errors on inputs that other applications consider invalid. They may also succeed on some inputs that might not be considered URLs elsewhere. Their purpose is for practical functionality rather than purity.

I'm inclined to say "well, why not" since this check is only performed if strict is true but OTOH, it could open a can of worms where users would want more and more checks.

shloktech · 2025-10-26T11:28:10Z

Hello @picnixz,
Thank you for the updates and feedback.

Please avoid tagging unrelated users. Usually, we ask for experts to chime in and we have a huge PR backlog.
Some PRs sit for even longer and the words "expedite the process" won't make the process faster.

Sure, I will ensure to tag relevant users. I agree I have got this bad habit of tagging top contributors of every repo to which I contribute to it is a simple hack which I found to get things done quickly :) Will ensure to not do it for this repo.

Finally, as it was said, this PR is still not in a mergeable state as docs and tests are lacking (on purpose, so it's fine for now).
I appreciate the clarification about its current non-mergeable state due to missing docs/tests ++ @Davda-James for it.

As for the urlsplit() behavior, I agree with your point. let's wait for @serhiy-storchaka reply.

I would also suggest not to use LLM-generated answers.

Yes, the previous message was enhanced using AI have ensured to not do it for this one :)

Happy Sunday!!

serhiy-storchaka

I added a comment on the issue.

As for this concrete code, I have two comments:

Valid characters should be checked before calling _unquote(). %5e is valid even if ^ is not.
The code is not particularly efficient. Perhaps using regular expressions will be more efficient. You can also check the whole input instead of separate fields.

urllib.parse.parse_qsl now raises ValueError if illegal characters is…

79f25b9

… passed

bedevere-app bot added the awaiting review label Aug 31, 2025

Davda-James changed the title ~~urllib.parse.parse_qsl now raises ValueError if illegal characters is passed, according to RFC 3986~~ gh-138284 : urllib.parse.parse_qsl now raises ValueError if illegal characters is passed, according to RFC 3986 Aug 31, 2025

bedevere-app bot mentioned this pull request Aug 31, 2025

urllib.parse.parse_qsl is accepting illegal characters #138284

Open

fixed the linting

345e86b

StanFromIreland reviewed Aug 31, 2025

View reviewed changes

replaced regex with char.isascii() and char.isalnum() and manual chec…

4934ff2

…k for performance

📜🤖 Added by blurb_it.

cf763db

shloktech approved these changes Aug 31, 2025

View reviewed changes

bedevere-app bot added awaiting core review and removed awaiting review labels Aug 31, 2025

StanFromIreland requested a review from orsenthil August 31, 2025 16:57

picnixz self-requested a review August 31, 2025 17:14

picnixz reviewed Aug 31, 2025

View reviewed changes

Davda-James and others added 2 commits August 31, 2025 23:43

Update Misc/NEWS.d/next/Library/2025-08-31-13-00-22.gh-issue-138284.6…

5cea514

…MOp4k.rst Updated it according to suggestion Co-authored-by: Bénédikt Tran <[email protected]>

changes made as per reviews

f42744b

serhiy-storchaka reviewed Oct 26, 2025

View reviewed changes

Uh oh!

gh-138284 : urllib.parse.parse_qsl now raises ValueError if illegal characters is passed, according to RFC 3986 #138291

Are you sure you want to change the base?

gh-138284 : urllib.parse.parse_qsl now raises ValueError if illegal characters is passed, according to RFC 3986 #138291

Uh oh!

Conversation

Davda-James commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

python-cla-bot bot commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bedevere-app bot commented Aug 31, 2025

Uh oh!

bedevere-app bot commented Aug 31, 2025

Uh oh!

StanFromIreland Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

Davda-James Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

StanFromIreland commented Aug 31, 2025

Uh oh!

bedevere-app bot commented Aug 31, 2025

Uh oh!

Davda-James commented Aug 31, 2025

Uh oh!

shloktech left a comment

Choose a reason for hiding this comment

Uh oh!

StanFromIreland commented Aug 31, 2025

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

picnixz Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

Davda-James Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

shloktech commented Oct 1, 2025

Uh oh!

Davda-James commented Oct 1, 2025

Uh oh!

shloktech commented Oct 3, 2025

Uh oh!

shloktech commented Oct 26, 2025

Uh oh!

picnixz commented Oct 26, 2025

Uh oh!

shloktech commented Oct 26, 2025

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Davda-James commented Aug 31, 2025 •

edited

Loading

python-cla-bot bot commented Aug 31, 2025 •

edited

Loading