-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
gh-99151: Improve performance and error readability of unittest.TestCase.assertDictEqual #126923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
gh-99151: Improve performance and error readability of unittest.TestCase.assertDictEqual #126923
Conversation
…DictEqual The function previously used a simple difflib.ndiff on top of a pprint.pformat of each dict, which resulted in very bad performance on large dicts and unclear assertion error outputs in many cases. This change formats the diffs in a more readable manner by inspecting the differences between the dicts, truncating long keys and values, and justifying values in the various groups of lines.
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please provide benchmarks testing the old and new approaches?
Misc/NEWS.d/next/Library/2024-11-16-21-04-16.gh-issue-99151.74rlUp.rst
Outdated
Show resolved
Hide resolved
Co-authored-by: Kirill Podoprigora <[email protected]>
Comparing two dicts with 10000 keys and values of 20 random bytes each. class TestDictCompare(unittest.TestCase):
def test_compare_dicts(self):
first = self.generate_dict()
second = self.generate_dict()
self.assertDictEqual(first, second)
def generate_dict(self):
length = 10000
d = {}
for _ in range(length):
d[random.randbytes(20)] = random.randbytes(20)
return d The previous implementation takes 15.3 seconds:
The new implementation takes 0.28 seconds:
|
For the test in the linked issue, I'm getting 6.6 seconds with the old vs. 0.22 seconds with the new. |
An example of the new output:
(edited to reflect changes in fcfdd92) |
Thanks! That's a pretty nice speedup! I'm just wondering, do we want to show the |
I'll remove the explanation line for identical keys and values. The old implementation doesn't omit anything, so if we omit values I think there should be some indication that there are more values than what is shown, rather than silently dropping them.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that the diffing is more complex, I think it needs more tests to cover the possible outputs :)
The function previously used a simple
difflib.ndiff
on top of apprint.pformat
of each dict, which resulted in very bad performance on large dicts and unclear assertion error outputs in many cases. This change formats the diffs in a more readable manner by inspecting the differences between the dicts, truncating long keys and values, and justifying values in the various groups of lines.