You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+3-241Lines changed: 3 additions & 241 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ The _control_ nodes represents the expected HTML tree, i.e. how the nodes are ex
9
9
-`MissingDiff`/`MissingAttrDiff`: Represents a difference where a control node or control attribute was expected to exist, but was not found in the test nodes tree.
10
10
-`UnexpectedDiff`/`UnexpectedAttrDiff`: Represents a difference where a test node or test attribute was unexpectedly found in the test nodes tree, but did not have a match in the control nodes tree.
11
11
12
-
##Usage
12
+
# Usage
13
13
To find the differences between a control HTML fragment and a test HTML fragment, using the default options, the easiest way is to use the `DiffBuilder` class, like so:
14
14
15
15
```csharp
@@ -22,247 +22,9 @@ var diffs = DiffBuilder
22
22
.Build();
23
23
```
24
24
25
-
The `DiffBuilder` class handles the relative complex task of setting up the `HtmlDifferenceEngine`.
25
+
Read more about the available options on the [Diffing Options/Strategies](/docs/Strategies.md) page.
26
26
27
-
Using the `UseDefaultOptions()` method is equivalent to setting the following options explicitly:
28
-
29
-
```csharp
30
-
vardiffs=DiffBuilder
31
-
.Compare(controlHtml)
32
-
.WithTest(testHtml)
33
-
.IgnoreComments()
34
-
.Whitespace(WhitespaceOption.Normalize)
35
-
.IgnoreDiffAttributes()
36
-
.Build();
37
-
```
38
-
39
-
## Diffing options/strategies:
40
-
The library comes with a bunch of options (internally referred to as strategies), for the following three main steps in the diffing process:
41
-
42
-
1. Filtering out irrelevant nodes and attributes
43
-
2. Matching up nodes and attributes for comparison
44
-
3. Comparing matched up nodes and attributes
45
-
46
-
The following section document the current built-in strategies that are available. A later second will describe how to built your own strategies, to get very tight control of the diffing process.
47
-
48
-
### Ignore comments
49
-
Enabling this strategy will ignore all comment nodes during comparison. Activate by calling the `IgnoreComments()` method on a `DiffBuilder` instance, e.g.:
50
-
51
-
```csharp
52
-
vardiffs=DiffBuilder
53
-
.Compare(controlHtml)
54
-
.WithTest(testHtml)
55
-
.IgnoreComments()
56
-
.Build();
57
-
```
58
-
59
-
_**NOTE**: Currently, the ignore comment strategy does NOT remove comments from CSS or JavaScript embedded in `<style>` or `<script>` tags._
60
-
61
-
### Text (text nodes) strategies
62
-
The built-in text strategies offer a bunch of ways to control how text (text nodes) is handled during the diffing process.
63
-
64
-
#### Whitespace handling
65
-
Whitespace can be a source of false-positives when comparing two HTML fragments. Thus, the whitespace handling strategy offer different ways to deal with it during a comparison.
66
-
67
-
-`Preserve` (default): Does not change or filter out any whitespace in text nodes the control and test HTML.
68
-
-`RemoveWhitespaceNodes`: Using this option filters out all text nodes that only consist of whitespace characters.
69
-
-`Normalize`: Using this option will _trim_ all text nodes and replace two or more whitespace characters with a single space character. This option implicitly includes the `RemoveWhitespaceNodes` option.
70
-
71
-
These options can be set either _globally_ for the entire comparison, or inline on a _specific subtrees in the comparison_.
72
-
73
-
To set a global default, call the method `Whitespace(WhitespaceOption)` on a `DiffBuilder` instance, e.g.:
74
-
75
-
```csharp
76
-
vardiffs=DiffBuilder
77
-
.Compare(controlHtml)
78
-
.WithTest(testHtml)
79
-
.Whitespace(WhitespaceOption.Normalize)
80
-
.Build();
81
-
```
82
-
83
-
To configure/override whitespace rules on a specific subtree in the comparison, use the `diff:whitespace="WhitespaceOption"` inline on a control element, and it and all text nodes below it will use that whitespace option, unless it is overridden on a child element. In the example below, all whitespace inside the `<h1>` element is preserved:
**Special case for `<pre>` elements:** The content of `<pre>` elements will always be treated as the `Preserve` option, even if whitespace option is globally set to `RemoveWhitespaceNodes` or `Normalize`. To override this, add a inline `diff:whitespace` attribute to the `<pre>`-tag, e.g.:
**NOTE:** It is on the issues list to deal with whitespace properly inside `<style>` and `<script>`-tags, e.g. inside strings.
98
-
99
-
#### Perform case-_insensitve_ text comparison
100
-
To compare the text in two text nodes to eachother using a case-insensitive comparison, call the `IgnoreCase()` method on a `DiffBuilder` instance, e.g.:
101
-
102
-
```csharp
103
-
vardiffs=DiffBuilder
104
-
.Compare(controlHtml)
105
-
.WithTest(testHtml)
106
-
.IgnoreCase()
107
-
.Build();
108
-
```
109
-
110
-
To configure/override ignore case rules on a specific subtree in the comparison, use the `diff:ignoreCase="true|false"` inline on a control element, and it and all text nodes below it will use that ignore case setting, unless it is overridden on a child element. In the example below, ignore case is set active for all text inside the `<h1>` element:
Note, as with all HTML5 boolean attributes, the `="true"` or `="false"` parts are optional.
119
-
120
-
#### Use regular expression when comparing text
121
-
By using the inline attribute `diff:regex` on the element containing the text node being compared, the comparer will consider the control text to be a regular expression, and will use that to test whether the test text node is as expected. This can be combined with the inline `diff:ignoreCase` attribute, to make the regular expression case-insensitive. E.g.:
122
-
123
-
```html
124
-
<header>
125
-
<h1diff:regexdiff:ignoreCase>Hello World \d{4}</h1>
126
-
</header>
127
-
```
128
-
129
-
The above control text would use case-insensitive regular expression to match against a test text string (e.g. "HELLO WORLD 2020").
130
-
131
-
### Inline Ignore attribute
132
-
If the inline `diff:ignore="true"` attribute is used on a control element (`="true"` implicit/optional), all their attributes and child nodes are skipped/ignored during comparison, including those of the test element, the control element is matched with.
133
-
134
-
In this example, the `<h1>` tag, it's attribute and children are considered the same as the element it is matched with:
135
-
136
-
```html
137
-
<header>
138
-
<h1class="heading-1"diff:ignore>Hello world</h1>
139
-
</header>
140
-
```
141
-
142
-
To only ignore a specific attribute during comparison, add the `:ignore` to the attribute in the control HTML. That will consider the control and test attribute the same. E.g. to ignore the `class` attribute, do:
143
-
144
-
```html
145
-
<header>
146
-
<h1class:ignore="heading-1">Hello world</h1>
147
-
</header>
148
-
```
149
-
150
-
Activate this strategy by calling the `EnableIgnoreAttribute()` method on a `DiffBuilder` instance, e.g.:
151
-
152
-
```csharp
153
-
vardiffs=DiffBuilder
154
-
.Compare(controlHtml)
155
-
.WithTest(testHtml)
156
-
.EnableInlineIgnore()
157
-
.Build();
158
-
```
159
-
160
-
### Attribute Compare options
161
-
The library supports various ways to perform attribute comparison.
162
-
163
-
#### Basic name and value comparison
164
-
The *"name and value comparison"* is the base comparison option, and that will test if both the names and the values of the control and test attributes are equal. E.g.:
165
-
166
-
-`attr="foo"` is the same as `attr="foo"`
167
-
-`attr="foo"` is the NOT same as `attr="bar"`
168
-
-`foo="attr"` is the NOT same as `bar="attr"`
169
-
170
-
This comparison mode is on by default.
171
-
172
-
#### RegEx attribute value comparer
173
-
It is possible to specify a regular expression in the control attributes value, and add the `:regex` postfix to the *control* attributes name, to have the comparison performed using a Regex match test. E.g.
174
-
175
-
-`attr:regex="foo-\d{4}"` is the same as `attr="foo-2019"`
176
-
177
-
#### Ignore case attribute value comparer
178
-
To get the comparer to perform a case insensitive comparison of the values of the control and test attribute, add the `:ignoreCase` postfix to the *control* attributes name. E.g.
179
-
180
-
-`attr:ignoreCase="FOO"` is the same as `attr="foo"`
181
-
182
-
#### Combine ignore case and regex attribute value comparer
183
-
To perform a case insenstive regular expression match, combine `:ignoreCase` and `:regex` as a postfix to the *control* attributes name. The order you combine them does not matter. E.g.
184
-
185
-
-`attr:ignoreCase:regex="FOO-\d{4}"` is the same as `attr="foo-2019"`
186
-
-`attr:regex:ignoreCase="FOO-\d{4}"` is the same as `attr="foo-2019"`
187
-
188
-
#### Class attribute comparer
189
-
The class attribute is special in HTML. It can contain a space separated list of CSS classes, whoes order does not matter. Therefor the library will ignore the order the CSS classes is specified in the class attribute of the control and test elements, and instead just ensure that both have the same CSS classes added to it. E.g.
190
-
191
-
-`class="foo bar"` is the same as `class="bar foo"`
192
-
193
-
To enable the special handling of the class attribute, call the `WithClassAttributeComparer()` on a `DiffBuilder` instance, e.g.:
194
-
195
-
```csharp
196
-
vardiffs=DiffBuilder
197
-
.Compare(controlHtml)
198
-
.WithTest(testHtml)
199
-
.WithClassAttributeComparer()
200
-
.Build();
201
-
```
202
-
203
-
#### Boolean attributes comparer
204
-
Another special type of attributes are the [boolean attributes](https://www.w3.org/TR/html52/infrastructure.html#sec-boolean-attributes). To make comparing these more forgiving, the boolean attribute comparer will consider two boolean attributes equal, according to these rules:
205
-
206
-
- In **strict** mode, a boolean attribute's value is considered truthy if the value is missing, empty, or is the name of the attribute.
207
-
- In **loose** mode, a boolean attribute's value is considered truthy if the attribute is present on an element.
208
-
209
-
For example, in **strict** mode, the following are considered equal:
210
-
211
-
-`required` is the same as `required=""`
212
-
-`required=""` is the same as `required="required"`
213
-
-`required="required"` is the same as `required="required"`
214
-
215
-
To enable the special handling of boolean attributes, call the `WithBooleanAttributeComparer(BooleanAttributeComparision.Strict)` or `WithBooleanAttributeComparer(BooleanAttributeComparision.Loose)` on a `DiffBuilder` instance, e.g.:
Any attributes that starts with `diff:` are automatically filtered out before matching/comparing happens. E.g. `diff:whitespace="..."` does not show up as a missing diff when added to an control element.
235
-
236
-
To enable this option, use the `IgnoreDiffAttributes()` method on the `DiffBuilder` class, e.g.:
237
-
238
-
```csharp
239
-
vardiffs=DiffBuilder
240
-
.Compare(controlHtml)
241
-
.WithTest(testHtml)
242
-
.IgnoreDiffAttributes()
243
-
.Build();
244
-
```
245
-
246
-
## Difference engine details
247
-
The heart of the library is the `HtmlDifferenceEngine` class, which goes through the steps illustrated in the activity diagram below to determine if the control nodes is the same as the test nodes.
248
-
249
-
The `HtmlDifferenceEngine` class depends on three _strategies_, the `IFilterStrategy`, `IMatcherStrategy`, and `ICompareStrategy` types. These are used in the highlighted activities in the diagram. With those, we can control what nodes and attributes take part in the comparison (filter strategy), how control and test nodes and attributes are matched up for comparison (matching strategy), and finally, how nodes and attributes are determined to be same or different (compare strategy).
250
-
251
-
It starts with a call to the `Compare(INodeList controlNodes, INodeList testNodes)` and recursively calls itself when nodes have child nodes.
252
-
253
-

254
-
255
-
The library comes with a bunch of different filters, matchers, and conparers, that you can configure and mix and match with your own, to get the exact diffing experience you want. See the Usage section above for details.
256
-
257
-
## Creating custom diffing strategies
258
-
259
-
TODO!
260
-
261
-
### Filters
262
-
- default starting decision is `true`.
263
-
- if a filter receives a source that it does not have an opinion on, it should always return the current decision, whatever it may be.
264
-
265
-
## Acknowledgement
27
+
## Acknowledgments
266
28
Big thanks to [Florian Rappl](https://github.com/FlorianRappl) from the AngleSharp team for providing ideas, input and sample code for working with AngleSharp.
267
29
268
30
Another shout-out goes to [XMLUnit](https://www.xmlunit.org). It is a great XML diffing library, and it has been a great inspiration for creating this library.
The heart of the library is the `HtmlDifferenceEngine` class, which goes through the steps illustrated in the activity diagram below to determine if the control nodes is the same as the test nodes.
3
+
4
+
The `HtmlDifferenceEngine` class depends on three _strategies_, the `IFilterStrategy`, `IMatcherStrategy`, and `ICompareStrategy` types. These are used in the highlighted activities in the diagram. With those, we can control what nodes and attributes take part in the comparison (filter strategy), how control and test nodes and attributes are matched up for comparison (matching strategy), and finally, how nodes and attributes are determined to be same or different (compare strategy).
5
+
6
+
It starts with a call to the `Compare(INodeList controlNodes, INodeList testNodes)` and recursively calls itself when nodes have child nodes.
7
+
8
+

9
+
10
+
The library comes with a bunch of different filters, matchers, and comparers, that you can configure and mix and match with your own, to get the exact diffing experience you want. See the [Options/Strategies page](Strategies.md) for details.
0 commit comments