gh-99151: Improve performance and error readability of unittest.TestCase.assertDictEqual #126923

merlinz01 · 2024-11-17T01:33:53Z

The function previously used a simple difflib.ndiff on top of a pprint.pformat of each dict, which resulted in very bad performance on large dicts and unclear assertion error outputs in many cases. This change formats the diffs in a more readable manner by inspecting the differences between the dicts, truncating long keys and values, and justifying values in the various groups of lines.

Issue: Calling unittest.assertDictEqual for medium-size dictionaries takes too long #99151

…DictEqual The function previously used a simple difflib.ndiff on top of a pprint.pformat of each dict, which resulted in very bad performance on large dicts and unclear assertion error outputs in many cases. This change formats the diffs in a more readable manner by inspecting the differences between the dicts, truncating long keys and values, and justifying values in the various groups of lines.

ghost · 2024-11-17T01:33:56Z

All commit authors signed the Contributor License Agreement.

bedevere-app · 2024-11-17T01:33:58Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

bedevere-app · 2024-11-17T01:59:26Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Eclips4

Can you please provide benchmarks testing the old and new approaches?

Misc/NEWS.d/next/Library/2024-11-16-21-04-16.gh-issue-99151.74rlUp.rst

Co-authored-by: Kirill Podoprigora <[email protected]>

merlinz01 · 2024-11-18T12:45:20Z

Comparing two dicts with 10000 keys and values of 20 random bytes each.

class TestDictCompare(unittest.TestCase):
    def test_compare_dicts(self):
        first = self.generate_dict()
        second = self.generate_dict()
        self.assertDictEqual(first, second)

    def generate_dict(self):
        length = 10000
        d = {}
        for _ in range(length):
            d[random.randbytes(20)] = random.randbytes(20)
        return d

The previous implementation takes 15.3 seconds:

F
======================================================================
FAIL: test_compare_dicts (__main__.TestDictCompare.test_compare_dicts)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/merlin/Projects/cpython/test_new_format.py", line 36, in test_compare_dicts
    self.assertDictEqual(
    ~~~~~~~~~~~~~~~~~~~~^
        first, second
        ^^^^^^^^^^^^^
    )
    ^
AssertionError: {b'\xbeA\r\xf3~L?\xa5\xf4\x97"\x94\x1f\x98T\xe[1244908 chars]xf0'} != {b'\x01\xfaz\x02r\xec\xe1\xdf\xba8\r\xdc\xd9\x[1247916 chars]x04'}
Diff is 12985674 characters long. Set self.maxDiff to None to see it.

----------------------------------------------------------------------
Ran 1 test in 15.300s

FAILED (failures=1)

The new implementation takes 0.28 seconds:

F
======================================================================
FAIL: test_compare_dicts (__main__.TestDictCompare.test_compare_dicts)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/merlin/Projects/cpython/test_new_format.py", line 36, in test_compare_dicts
    self.assertDictEqual(
    ~~~~~~~~~~~~~~~~~~~~^
        first, second
        ^^^^^^^^^^^^^
    )
    ^
AssertionError: {b'_\x94|^\xb9e\x98\x86\xdeUd\xf3\x7f\xc5\x9[1246065 chars]x80'} != {b'\x16_W\xea}\xf1 \xaa`=\x06\xf2\xf4#\xe5\x[1247665 chars]xe7'}
Diff is 2197425 characters long. Set self.maxDiff to None to see it.

----------------------------------------------------------------------
Ran 1 test in 0.277s

FAILED (failures=1)

merlinz01 · 2024-11-18T17:38:57Z

For the test in the linked issue, I'm getting 6.6 seconds with the old vs. 0.22 seconds with the new.

merlinz01 · 2024-11-18T17:47:20Z

An example of the new output:

{
    'eighteighteight': 'eight',
    'four'           : 4,
Keys in both dicts with differing values:
  - 'oneoneoneone': 1,
  + 'oneoneoneone': 2,
  - 'six': 6,
  + 'six': 'six',
Keys in the first dict but not the second:
  - 'seven': 'averyveryveryveryveryveryveryveryv[44 chars]alue',
  - 'two'  : 2,
Keys in the second dict but not the first:
  + 'averyveryveryveryveryveryveryveryl[22 chars]gkey': {'oneoneoneone': 1, 'two': 2, 'four[133 chars]ght'},
  + 'five'                                            : 5,
  + 'three'                                           : 3,
} : Hey, you passed dicts that were not equal!

(edited to reflect changes in fcfdd92)

tomasr8 · 2024-11-18T18:41:01Z

Thanks! That's a pretty nice speedup! I'm just wondering, do we want to show the Keys in both dicts with identical values? If they're identical, maybe we can just omit them form the diff output?

merlinz01 · 2024-11-18T19:01:27Z

I'll remove the explanation line for identical keys and values.

The old implementation doesn't omit anything, so if we omit values I think there should be some indication that there are more values than what is shown, rather than silently dropping them.
E.g.

{
(omitted 590 matching key-value pairs)
Keys in both dicts with differing values:
  ...

…DictEqual

tomasr8

Now that the diffing is more complex, I think it needs more tests to cover the possible outputs :)

Lib/unittest/case.py

bedevere-app bot added the awaiting review label Nov 17, 2024

merlinz01 changed the title ~~gh-23474: Improve performance and error readability of unittest.TestCase.assertDictEqual~~ gh-27434: Improve performance and error readability of unittest.TestCase.assertDictEqual Nov 17, 2024

Update tests for new output of unittest.TestCase.assertDictEquals

7fba9dc

merlinz01 changed the title ~~gh-27434: Improve performance and error readability of unittest.TestCase.assertDictEqual~~ gh-99151: Improve performance and error readability of unittest.TestCase.assertDictEqual Nov 17, 2024

bedevere-app bot mentioned this pull request Nov 17, 2024

Calling unittest.assertDictEqual for medium-size dictionaries takes too long #99151

Open

FFY00 and others added 3 commits November 16, 2024 21:25

pythonGH-126789: fix some sysconfig data on late site initializations

002d692

pythonGH-126920: fix Makefile overwriting sysconfig.get_config_vars

db4104d

Add NEWS.d entry for assertDictEqual changes

0c320e5

merlinz01 requested review from FFY00 and vsajip as code owners November 17, 2024 02:25

Merge branch 'main' into improve-unittest-assert-dict-equal

3277e61

Eclips4 reviewed Nov 17, 2024

View reviewed changes

Misc/NEWS.d/next/Library/2024-11-16-21-04-16.gh-issue-99151.74rlUp.rst Outdated Show resolved Hide resolved

Update news entry

80d2a77

Co-authored-by: Kirill Podoprigora <[email protected]>

Improve performance and error readability of unittest.TestCase.assert…

fcfdd92

…DictEqual

tomasr8 reviewed Nov 19, 2024

View reviewed changes

Lib/unittest/case.py Outdated Show resolved Hide resolved

merlinz01 added 3 commits November 19, 2024 12:43

Remove type hints

88293fe

Add additional tests for new output format

50e413c

Merge branch 'main' into improve-unittest-assert-dict-equal

22ce97c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-99151: Improve performance and error readability of unittest.TestCase.assertDictEqual #126923

gh-99151: Improve performance and error readability of unittest.TestCase.assertDictEqual #126923

Uh oh!

merlinz01 commented Nov 17, 2024 •

edited by bedevere-app bot

Loading

Uh oh!

ghost commented Nov 17, 2024 •

edited by ghost

Loading

Uh oh!

bedevere-app bot commented Nov 17, 2024

Uh oh!

bedevere-app bot commented Nov 17, 2024

Uh oh!

Eclips4 left a comment

Uh oh!

Uh oh!

merlinz01 commented Nov 18, 2024

Uh oh!

merlinz01 commented Nov 18, 2024

Uh oh!

merlinz01 commented Nov 18, 2024 •

edited

Loading

Uh oh!

tomasr8 commented Nov 18, 2024

Uh oh!

merlinz01 commented Nov 18, 2024

Uh oh!

tomasr8 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

gh-99151: Improve performance and error readability of unittest.TestCase.assertDictEqual #126923

Are you sure you want to change the base?

gh-99151: Improve performance and error readability of unittest.TestCase.assertDictEqual #126923

Uh oh!

Conversation

merlinz01 commented Nov 17, 2024 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Nov 17, 2024 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bedevere-app bot commented Nov 17, 2024

Uh oh!

bedevere-app bot commented Nov 17, 2024

Uh oh!

Eclips4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

merlinz01 commented Nov 18, 2024

Uh oh!

merlinz01 commented Nov 18, 2024

Uh oh!

merlinz01 commented Nov 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomasr8 commented Nov 18, 2024

Uh oh!

merlinz01 commented Nov 18, 2024

Uh oh!

tomasr8 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

merlinz01 commented Nov 17, 2024 •

edited by bedevere-app bot

Loading

ghost commented Nov 17, 2024 •

edited by ghost

Loading

merlinz01 commented Nov 18, 2024 •

edited

Loading