Skip to content

Conversation

@kairi003
Copy link

@kairi003 kairi003 commented Jan 7, 2024

MozillaCookieJar recognizes and writes the #HttpOnly_ prefix, based on curl's specifications. However, it is uncommon for cookies obtained via HTTP to be written with the HttpOnly prefix.

This issue stems from the case-sensitive nature of the HttpOnly attribute check and the fact that the value of the constant HTTPONLY_ATTR is "HTTPOnly" instead of the more commonly used "HttpOnly".

In accordance with RFC6265, this PR proposes the addition of a case_insensitive option to Cookie.has_nonstandard_attr. Additionally, it modifies the value of HTTPONLY_ATTR to the more widely recognized "HttpOnly".

This PR includes code changes, docs fixes, and additional tests.


📚 Documentation preview 📚: https://cpython-previews--113795.org.readthedocs.build/

@ghost
Copy link

ghost commented Jan 7, 2024

All commit authors signed the Contributor License Agreement.
CLA signed

Cookie.has_nonstandard_attr method is case-sensitive, but HTTP is
case-insensitive for cookie attribute names. This option is useful
for MozillaCookieJar to check the HttpOnly flag when saving.
This commit updates the value of the HTTPONLY_ATTR constant from
"HTTPOnly" to the more commonly used "HttpOnly".
It makes HTTP communication more consistent.
Modified attribute checking in MozillaCookieJar.save to be
case-insensitive, aligning with HTTP standards. This change
resolves the issue where HttpOnly prefix was not correctly
appended due to case-sensitive checks.
@bedevere-app
Copy link

bedevere-app bot commented Mar 23, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@bedevere-app
Copy link

bedevere-app bot commented Mar 23, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know whether it should considered a feature or a bug fix as we're aligning with the RFC now. I think it could be considered a bug fix but OTOH, we're changing the HTTPONLY_ATTR.

Now, be careful that the value is also used in _really_load:

if line.startswith(HTTPONLY_PREFIX):
    rest[HTTPONLY_ATTR] = ""
    line = line[len(HTTPONLY_PREFIX):]

In this case, shouldn't we do something as well with the case? because otherwise, files that were successfully parsed, wouldn't be now.

cc @vadmium as my favorite web RFC expert c:

Comment on lines 1960 to 1973
for key in ["foo1", "foo2", "foo3"]:
matches = [x for x in lines if key in x]
self.assertEqual(len(matches), 1,
"Incorrect number of matches for cookie with value %r" % key)
self.assertTrue(matches[0].startswith("#HttpOnly_"),
"Cookie with value %r is missing the HttpOnly prefix" % key)

# Check that the HttpOnly prefix is not added to the correct cookies
for key in ["foo4"]:
matches = [x for x in lines if key in x]
self.assertEqual(len(matches), 1,
"Incorrect number of matches for cookie with value %r" % key)
self.assertFalse(matches[0].startswith("#HttpOnly_"),
"Cookie with value %r has the HttpOnly prefix" % key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use self.subTest(key=key) and drop the assertion messages. It could be easier.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I did it in 7f72d4e

@kairi003
Copy link
Author

kairi003 commented Mar 23, 2025

@picnixz
Thank you for your review!

In this case, shouldn't we do something as well with the case? Because otherwise, files that were successfully parsed wouldn't be now.

You're right, HTTPONLY_ATTR is used elsewhere and risks breaking existing code.
It reflects the HttpOnly attribute in the Cookies._rest field when loading cookies from files.

Here's what I've found upon further review:

  • Cookie._rest is a field with an underscore prefix, which implies it is not expected to be accessed directly.
  • The only 3 methods that manipulate Cookie._rest are has/get/set_nonstandard_attr(), and in the standard library, they are only used for managing the HttpOnly attribute (which is the scope of this change).
  • Generally, HttpOnly is checked for its presence rather than its value, so the only method affected by this fix should be has_nonstandard_attr().
  • HttpOnly is an attribute meant for browser security and is not considered in requests made by Python.

Therefore, this change only affects users who directly reference Cookie._rest or call Cookie.get_nonstandard_attr('HTTPOnly'). This is not a common operation.

If full backward compatibility is required, we could keep HTTPONLY_ATTR = "HTTPOnly".
Since the logic has been modified to be case-insensitive, this should be sufficient for RFC compliance.
However, the RFC shows the case of this attribute as "HttpOnly", and it is more common on the web.

I am not a specialist in this area, so I will defer to the judgment of the reviewers and those who have more expertise.

Thank you :)

@kairi003
Copy link
Author

As a side note, when you make a typical request, Cookie._rest will be set to "HttpOnly" in most cases.

import requests
resp = requests.get('https://google.com')
cookie = list(resp.cookies)[0]
print(cookie._rest)
# -> {'HttpOnly': None, 'SameSite': 'lax'}

@picnixz
Copy link
Member

picnixz commented Mar 23, 2025

HttpOnly

That I know. I encountered this kind of issues when writing my own crawler so I'm not surprised. However, the best way to do make it smooth is to support both names, and emit a warning if HTTPOnly is encountered.

@kairi003
Copy link
Author

@picnixz
That is a very good idea.
Which do you think is better, actually having both values in _rest dict or automatically replacing HttpOnly when HTTPOnly is encountered?

Both values in `_rest` dict
HTTPONLY_ATTR = "HttpOnly"
HTTPONLY_ATTR_OLD = "HTTPOnly"

class MozillaCookieJar(FileCookieJar):
    def _really_load(self, f, filename, ignore_discard, ignore_expires):
        ...
        if line.startswith(HTTPONLY_PREFIX):
            rest[HTTPONLY_ATTR] = ""
            rest[HTTPONLY_ATTR_OLD] = ""
        ...

class Cookie:
    def has_nonstandard_attr(self, name, case_insensitive=False):
        if name == HTTPONLY_ATTR_OLD:
            warnings.warn(warning_message)
        ...
    
    def get_nonstandard_attr(self, name, default=None):
        if name == HTTPONLY_ATTR_OLD:
            warnings.warn(warning_message)
        ...
    
    def set_nonstandard_attr(self, name, value):
        if name == HTTPONLY_ATTR_OLD:
            warnings.warn(warning_message)
        ...
Automatically replacing
class MozillaCookieJar(FileCookieJar):
    def _really_load(self, f, filename, ignore_discard, ignore_expires):
        ...
        if line.startswith(HTTPONLY_PREFIX):
            rest[HTTPONLY_ATTR] = ""
        ...

class Cookie:
    def has_nonstandard_attr(self, name, case_insensitive=False):
        if name == HTTPONLY_ATTR_OLD:
            warnings.warn(warning_message)
            name = HTTPONLY_ATTR
        ...
    
    def get_nonstandard_attr(self, name, default=None):
        if name == HTTPONLY_ATTR_OLD:
            warnings.warn(warning_message)
            name = HTTPONLY_ATTR
        ...
    
    def set_nonstandard_attr(self, name, value):
        if name == HTTPONLY_ATTR_OLD:
            warnings.warn(warning_message)
            name = HTTPONLY_ATTR
        ...

@picnixz
Copy link
Member

picnixz commented Mar 23, 2025

I'm more worried with APIs overriding those methods so I think we should rather not change it automatically. We should either way document this change but I don't know which one would be the best (which is why I'd like to wait for another core dev). Personally, I would just replace HTTPOnly by HttpOnly to align with the RFC, but I don't know if this can break existing clients or apps in the wild. Agreed that currently it doesn't work well, but if you can find evidence that HTTPOnly is barely used, we could perhaps outright support it in _really_load for legacy reasons but without warning the end-user at all.

@vadmium
Copy link
Member

vadmium commented Apr 2, 2025

I think adding the new case_insensitive public parameter is not great for a bug fix. I presume there should be an “Added/changed in version X” notice for it in the documentation.

As a new feature, I would make it a keyword-only parameter and probably also add it to the get_nonstandard_attr method. And I wonder if case_insensitive=True should be the default or only behaviour. On one hand this would be a change in behaviour; on the other hand it might mitigate compatibility concerns with changing HTTPONLY_ADDR.

Regarding the letter case of HTTPONLY_ATTR, changing it seems unnecessary to fix the bug. I understand changing it would only affect the “nonstandard attr” name in Cookie objects read from a file. That seems to have minimal reward and minimal risk, but it does seem nicer to use the casing from RFC 6265. I would only make the change in the next Python version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants