- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 33.2k
gh-132461: Fix crash in OrderedDict.setdefault when key has unstable hash #132462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a regression test and a NEWS entry as well.
| Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the  | 
Replaced assertion with a runtime check in OrderedDict_setdefault_impl to gracefully handle keys with changing hash values. If such a key is detected, a TypeError is raised instead of aborting the interpreter.
| Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the  | 
- Add test_unstable_hash_should_not_abort_issue132461 to ensure that OrderedDict handles keys with unstable hashes gracefully. - Simulates randomized hashes using a custom metaclass and verifies that an exception is raised rather than a crash occurring. - References pythongh-132461 to track and validate regression coverage.
| Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the  | 
| 
 Added. | 
| large_num = 2**64 | ||
|  | ||
| class WeirdBase(abc.ABCMeta): | ||
| def __hash__(self): | ||
| return randint(0, large_num) | ||
|  | ||
| class weird_bytes(bytes, metaclass=WeirdBase): | ||
| pass | ||
|  | ||
| obj = self.OrderedDict() | ||
| with self.assertRaises(Exception): | ||
| for x in range(100): | ||
| obj.setdefault(weird_bytes, None) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| large_num = 2**64 | |
| class WeirdBase(abc.ABCMeta): | |
| def __hash__(self): | |
| return randint(0, large_num) | |
| class weird_bytes(bytes, metaclass=WeirdBase): | |
| pass | |
| obj = self.OrderedDict() | |
| with self.assertRaises(Exception): | |
| for x in range(100): | |
| obj.setdefault(weird_bytes, None) | |
| class A: | |
| c = 0 | |
| def __hash__(self): | |
| self.c += 1 | |
| return self.c | |
| a = A() | |
| obj = self.OrderedDict() | |
| msg = "OrderedDict key appears to change its hash value over time" | |
| with self.assertRaisesRegex(RuntimeError, msg): | |
| for _ in range(100): | |
| obj.setdefault(a, None) | 
We can also remove abc dependency and randint import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
part of test
        with self.assertRaises(KeyError):
            for x in range(100):
                obj.setdefault(weird_bytes, None)
error
ERROR: test_unstable_hash_should_not_abort_issue132461 (test.test_ordered_dict.CPythonOrderedDictTests.test_unstable_hash_should_not_abort_issue132461)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/learning/cpython/Lib/test/test_ordered_dict.py", line 747, in test_unstable_hash_should_not_abort_issue132461
    obj.setdefault(weird_bytes, None)
    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
RuntimeError: OrderedDict key appears to change its hash value over time
fixed part of test
        with self.assertRaises(RuntimeError):
            for x in range(100):
                obj.setdefault(weird_bytes, None)
ERROR: test_unstable_hash_should_not_abort_issue132461 (test.test_ordered_dict.CPythonOrderedDictSubclassTests.test_unstable_hash_should_not_abort_issue132461)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/learning/cpython/Lib/test/test_ordered_dict.py", line 747, in test_unstable_hash_should_not_abort_issue132461
    obj.setdefault(weird_bytes, None)
    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
KeyError: <class 'test.test_ordered_dict.OrderedDictTests.test_unstable_hash_should_not_abort_issue132461.<locals>.weird_bytes'>
======================================================================
ERROR: test_unstable_hash_should_not_abort_issue132461 (test.test_ordered_dict.PurePythonOrderedDictSubclassTests.test_unstable_hash_should_not_abort_issue132461)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/learning/cpython/Lib/test/test_ordered_dict.py", line 747, in test_unstable_hash_should_not_abort_issue132461
    obj.setdefault(weird_bytes, None)
    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "/learning/cpython/Lib/collections/__init__.py", line 274, in setdefault
    return self[key]
           ~~~~^^^^^
KeyError: <class 'test.test_ordered_dict.OrderedDictTests.test_unstable_hash_should_not_abort_issue132461.<locals>.weird_bytes'>
======================================================================
ERROR: test_unstable_hash_should_not_abort_issue132461 (test.test_ordered_dict.PurePythonOrderedDictTests.test_unstable_hash_should_not_abort_issue132461)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/learning/cpython/Lib/test/test_ordered_dict.py", line 747, in test_unstable_hash_should_not_abort_issue132461
    obj.setdefault(weird_bytes, None)
    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "/learning/cpython/Lib/collections/__init__.py", line 274, in setdefault
    return self[key]
           ~~~~^^^^^
KeyError: <class 'test.test_ordered_dict.OrderedDictTests.test_unstable_hash_should_not_abort_issue132461.<locals>.weird_bytes'>
Very strange
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, we should only catch RuntimeError and not arbitrary exceptions. The assertion failing may be an indication that something else is wrong.
| Please, add the NEWS entry mentioning the bugfix. | 
| Note that the Python implementation would not crash so the test should be only meant for the C implementation. | 
| 
 I believe that the behavior should be the same. Why only for the C interface? | 
| 
 How i can do it? Please, help. | 
| 
 The behavior is not the same. The pure python implementation does not use the same implementation and does not have the same construction (but the functionalities are identical; it's just that some bugs will only appear in the C version) 
 Using https://pypi.org/project/blurb/. See #132462 (comment) for instance. | 
| @dura0ok would you like to continue this PR? | 
| @emmatyping yes. | 
| Hi! While the added assertion and error check prevent crashes when keys have unstable hash values, the current solution still leaves us in a difficult position. To correctly and reliably detect that a key with the same equality (eq) but a different hash (hash) has already been inserted, we would need to store the original hash for each key in the dict. This introduces overhead — both in terms of memory and performance — just to protect against a relatively rare "misbehavior" (i.e., keys with unstable hashes). It feels like this adds non-trivial cost to all users just to catch a developer error. Is there a better trade-off here? Maybe this should just remain undefined behavior, or at most be a debug-only check? I’m unsure how to proceed — any suggestions would be appreciated. | 
|  | ||
| class WeirdBase(abc.ABCMeta): | ||
| def __hash__(self): | ||
| return randint(0, large_num) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to return a new one on each call, otherwise this test will sometimes be flaky, when random will produce the same number twice.
| I think we should close this issue with reason "Wont do". | 
Replaced assertion with a runtime check in OrderedDict_setdefault_impl to gracefully handle keys with changing hash values. If such a key is detected, a TypeError is raised instead of aborting the interpreter.
OrderedDict.setdefaultwith an invalid value #132461