Skip to content

Output of difflib.HtmlDiff.make_table on the same input is not constant between multiple runs #124521

@stefan6419846

Description

@stefan6419846

Bug report

Bug description:

difflib.HtmlDiff.make_table (and make_file as well) generate non-deterministic results without this being documented anywhere.

Background: I am using this functionality and run some integration/unit tests for my own code which checks the HTML output as well. Doing so I discovered that the results would differ, depending on the execution order of the tests itself.

Let's consider this simple example:

import shutil
from difflib import HtmlDiff
from tempfile import NamedTemporaryFile


with NamedTemporaryFile() as file1, NamedTemporaryFile() as file2:
    file1.write(b'Hello World!\n')
    file2.write(b'Foo Bar\n')
    file1.seek(0)
    file2.seek(0)

    html1 = HtmlDiff().make_table(fromlines=['Line 1\n', 'Line 2\n'], tolines=['Line 1\n', 'Line 3\n'])
    html2 = HtmlDiff().make_table(fromlines=['Line 1\n', 'Line 2\n'], tolines=['Line 1\n', 'Line 3\n'])

print(html1)
print('=' * shutil.get_terminal_size().columns)
print(html2)

assert html1 == html2

This will fail with an assertion error due to different HTML:

    <table class="diff" id="difflib_chg_to0__top"
           cellspacing="0" cellpadding="0" rules="groups" >
        <colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
        <colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
        
        <tbody>
            <tr><td class="diff_next" id="difflib_chg_to0__0"><a href="#difflib_chg_to0__0">f</a></td><td class="diff_header" id="from0_1">1</td><td nowrap="nowrap">Line&nbsp;1</td><td class="diff_next"><a href="#difflib_chg_to0__0">f</a></td><td class="diff_header" id="to0_1">1</td><td nowrap="nowrap">Line&nbsp;1</td></tr>
            <tr><td class="diff_next"><a href="#difflib_chg_to0__top">t</a></td><td class="diff_header" id="from0_2">2</td><td nowrap="nowrap">Line&nbsp;<span class="diff_chg">2</span></td><td class="diff_next"><a href="#difflib_chg_to0__top">t</a></td><td class="diff_header" id="to0_2">2</td><td nowrap="nowrap">Line&nbsp;<span class="diff_chg">3</span></td></tr>
        </tbody>
    </table>
================================================================================

    <table class="diff" id="difflib_chg_to1__top"
           cellspacing="0" cellpadding="0" rules="groups" >
        <colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
        <colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
        
        <tbody>
            <tr><td class="diff_next" id="difflib_chg_to1__0"><a href="#difflib_chg_to1__0">f</a></td><td class="diff_header" id="from1_1">1</td><td nowrap="nowrap">Line&nbsp;1</td><td class="diff_next"><a href="#difflib_chg_to1__0">f</a></td><td class="diff_header" id="to1_1">1</td><td nowrap="nowrap">Line&nbsp;1</td></tr>
            <tr><td class="diff_next"><a href="#difflib_chg_to1__top">t</a></td><td class="diff_header" id="from1_2">2</td><td nowrap="nowrap">Line&nbsp;<span class="diff_chg">2</span></td><td class="diff_next"><a href="#difflib_chg_to1__top">t</a></td><td class="diff_header" id="to1_2">2</td><td nowrap="nowrap">Line&nbsp;<span class="diff_chg">3</span></td></tr>
        </tbody>
    </table>

The specific issue is that both tables have a different index. Digging through the code, this is due to all instances using the same counter _default_prefix, referenced through the class instead of self:

cpython/Lib/difflib.py

Lines 1886 to 1895 in 78aeb38

def _make_prefix(self):
"""Create unique anchor prefixes"""
# Generate a unique anchor prefix so multiple tables
# can exist on the same HTML page without conflicts.
fromprefix = "from%d_" % HtmlDiff._default_prefix
toprefix = "to%d_" % HtmlDiff._default_prefix
HtmlDiff._default_prefix += 1
# store prefixes so line format method has access
self._prefix = [fromprefix,toprefix]
The corresponding code has not been touched since its introduction in 2004 (e064b41). I understand that it is required to somehow maintain a global state when there are multiple tables on a page, but this should be documented and the user should have a public interface to change/reset the counter accordingly.

CPython versions tested on:

3.9, 3.11, CPython main branch

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    type-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions