-
Notifications
You must be signed in to change notification settings - Fork 54
Open
Description
When using the following test HTML files as input...
$ cat old.html
<html>
<body>
some <div>text and more</div> text
</body>
</html>
$ cat new.html
<html>
<body>
some <div class='red'>text</div> and more <strong>text</strong>
</body>
</html>
$ graphtage old.html new.html
<html>
<body>
some <̟d̟i̟v̟ ̟c̟l̟a̟s̟s̟=̟"̟r̟e̟d̟"̟>̟t̟e̟x̟t̟<̟/̟d̟i̟v̟>̟
<̟s̟t̟r̟o̟n̟g̟>̟t̟e̟x̟t̟<̟/̟s̟t̟r̟o̟n̟g̟>̟
<̶d̶i̶v̶>̶t̶e̶x̶t̶ ̶a̶n̶d̶ ̶m̶o̶r̶e̶<̶/̶d̶i̶v̶>̶
</body>
</html>
..., as you can see, the text and more is missing from the diff generated by graphtage.
I've tried some other diff tools and it seems and none of them had any success with correctly processing these two files for some reason (many are using the same core algorithm I suppose). Is there some kind of general issue with processing text not enclosed in tags (as in, and more is between two elements, but not enclosed in any tag (apart from the parent <body> tag) itself)?
I have also tried surrounding and more in a <p> tag in new.html, which resulted in this mess:
$ graphtage old.html new.html
<html>
<body>
some <̟d̟i̟v̟ ̟c̟l̟a̟s̟s̟=̟"̟r̟e̟d̟"̟>̟t̟e̟x̟t̟<̟/̟d̟i̟v̟>̟
<p̟d̶i̶v̶>t̶e̶x̶t̶ ̶and more</p̟d̶i̶v̶>
<̟s̟t̟r̟o̟n̟g̟>̟t̟e̟x̟t̟<̟/̟s̟t̟r̟o̟n̟g̟>̟
</body>
</html>
What's happening?
Metadata
Metadata
Assignees
Labels
No labels

