You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# AngleSharp Diffing - A diff/compare library for AngleSharp
2
-
[](https://github.com/egil/AngleSharp.Diffing/actions?workflow=CI)
3
-
4
2
This library makes it possible to compare a AngleSharp _control_`INodeList` and a _test_`INodeList` and get a list of `IDiff` differences between them.
5
3
6
4
The _control_ nodes represents the expected HTML tree, i.e. how the nodes are expected to look, and the _test_ nodes represents the nodes that should be compared to the _control_ nodes.
@@ -11,7 +9,7 @@ The _control_ nodes represents the expected HTML tree, i.e. how the nodes are ex
11
9
-`MissingDiff`/`MissingAttrDiff`: Represents a difference where a control node or control attribute was expected to exist, but was not found in the test nodes tree.
12
10
-`UnexpectedDiff`/`UnexpectedAttrDiff`: Represents a difference where a test node or test attribute was unexpectedly found in the test nodes tree, but did not have a match in the control nodes tree.
13
11
14
-
##Usage
12
+
# Usage
15
13
To find the differences between a control HTML fragment and a test HTML fragment, using the default options, the easiest way is to use the `DiffBuilder` class, like so:
16
14
17
15
```csharp
@@ -20,162 +18,18 @@ var testHtml = "<p>World, I say hello</p>";
20
18
vardiffs=DiffBuilder
21
19
.Compare(controlHtml)
22
20
.WithTest(testHtml)
23
-
.UseDefaultOptions()
24
-
.Build();
25
-
```
26
-
27
-
The `DiffBuilder` class handles the relative complex task of setting up the `HtmlDifferenceEngine`.
28
-
29
-
Using the `UseDefaultOptions()` method is equivalent to setting the following options explicitly:
30
-
31
-
```csharp
32
-
vardiffs=DiffBuilder
33
-
.Compare(controlHtml)
34
-
.WithTest(testHtml)
35
-
.RemoveComments()
36
-
.Whitespace(WhitespaceOption.Normalize)
37
-
.IgnoreDiffAttributes()
21
+
.WithDefaultOptions()
38
22
.Build();
39
-
```
40
-
41
-
See more about what each option does in the following sections.
42
-
43
-
## Diffing options/strategies:
44
-
The library comes with a bunch of options (internally referred to as strategies), for the following three main steps in the diffing process:
45
-
46
-
1. Filtering out irrelevant nodes and attributes
47
-
2. Matching up nodes and attributes for comparison
48
-
3. Comparing matched up nodes and attributes
49
-
50
-
The following section documents the current built-in strategies that are available. A later second will describe how to built your own strategies, to get very tight control of the diffing process.
51
-
52
-
### Remove comments
53
-
Enabling this strategy will remove all comment nodes from the comparison. Activate by calling the `RemoveComments()` method on a `DiffBuilder` instance, e.g.:
54
-
55
-
```csharp
56
-
vardiffs=DiffBuilder
57
-
.Compare(controlHtml)
58
-
.WithTest(testHtml)
59
-
.RemoveComments()
60
-
.Build();
61
-
```
62
-
63
-
_NOTE: Currently, the remove comment strategy does NOT remove comments from CSS or JavaScript embedded in `<style>` or `<script>` tags._
64
-
65
-
### Whitespace handling
66
-
Whitespace can be a source of false-positives when comparing two HTML fragments. Thus, the whitespace handling strategy offer different ways to deal with it during a comparison.
67
-
68
-
-`Preserve`: Does not change or filter out any whitespace in control and test HTML. Default, same as not specifying any options.
69
-
-`RemoveWhitespaceNodes`: Using this option filters out all text nodes that only consist of whitespace characters.
70
-
-`Normalize`: Using this option will _trim_ all text nodes and replace two or more whitespace characters with a single space character.
71
-
72
-
These options can be set either _globally_ for the entire comparison, or on a _specific subtrees in the comparison_.
73
-
74
-
To set a global default, call the method `Whitespace(WhitespaceOption)` on a `DiffBuilder` instance, e.g.:
75
-
76
-
```csharp
77
-
vardiffs=DiffBuilder
78
-
.Compare(controlHtml)
79
-
.WithTest(testHtml)
80
-
.Whitespace(WhitespaceOption.Normalize)
81
-
.Build();
82
-
```
83
-
84
-
To configure/override whitespace rules on a specific subtree in the comparison, use the `diff:whitespace="WhitespaceOption"` on a control node, and it and all nodes below it will use that whitespace option, unless it is overridden on a child node. In the example below, all whitespace inside the `<h1>` element is preserved:
**Special case for `<pre>`-tags:** The content of `<pre />` tags will always be treated as the `Preserve` option, even if whitespace strategy is globally set to `RemoveWhitespaceNodes` or `Normalize`. To override this, add a local `diff:whitespace" attribute to the tag, e.g.:
**Special case for `<style>`-tags:** It is on the TODO list to handle string in CSS more intelligently: Even if the whitespace option is `Normalize`, whitespace inside quotes (`"` and `'` style quotes) is preserved as is. For example, the text inside the `content` style information in the following CSS will not be normalized: `p::after { content: " -.- "; }`.
99
-
100
-
**Special case for `<script>`-tags:** It is on the TODO list to deal with whitespace properly inside `<script>`-tags.
101
-
102
-
### Ignore attribute
103
-
If the `diff:ignore="true"` attribute is used on a control element (`="true"` implicit/optional), all their attributes and child nodes are skipped/ignored during comparison, including those of the test element, the control element is matched with.
104
-
105
-
In this example, the `<h1>` tag, it's attribute and children are considered the same as the element it is matched with:
106
-
107
-
```html
108
-
<header>
109
-
<h1class="heading-1"diff:ignore>Hello world</h1>
110
-
</header>
111
-
```
112
-
113
-
To only ignore a specific attribute during comparison, add the `:ignore` to the attribute in the control HTML. That will consider the control and test attribute the same. E.g. to ignore the `class` attribute, do:
114
-
115
-
```html
116
-
<header>
117
-
<h1class:ignore="heading-1">Hello world</h1>
118
-
</header>
119
-
```
120
-
121
-
Activate this strategy by calling the `EnableIgnoreAttribute()` method on a `DiffBuilder` instance, e.g.:
122
-
123
-
```csharp
124
-
vardiffs=DiffBuilder
125
-
.Compare(controlHtml)
126
-
.WithTest(testHtml)
127
-
.EnableIgnoreAttribute()
128
-
.Build();
129
-
```
130
-
131
-
### Matching options
132
-
133
-
#### One-to-one matcher (node, attr)
134
-
135
-
#### Forward-searching matcher (node)
136
-
137
-
#### CSS selector-cross tree matcher (node, attr)
138
-
139
-
### Compare options
140
-
141
-
#### Name/Type matcher (node, attr)
142
-
#### Content matcher (text, attr)
143
-
#### Content regex matcher (text, attr)
144
-
#### IgnoreCase content matcher (text, attr)
145
-
146
-
### Ignoring special `diff:` attributes
147
-
Any attributes that starts with `diff:` are automatically filtered out before matching/comparing happens. E.g. `diff:whitespace="..."` does not show up as a missing diff when added to an control element.
148
-
149
-
To enable this option, use the `IgnoreDiffAttributes()` method on the `DiffBuilder` class, e.g.:
150
-
151
-
```csharp
152
-
vardiffs=DiffBuilder
153
-
.Compare(controlHtml)
154
-
.WithTest(testHtml)
155
-
.IgnoreDiffAttributes()
156
-
.Build();
157
-
```
158
-
159
-
## Difference engine details
160
-
The heart of the library is the `HtmlDifferenceEngine` class, which goes through the steps illustrated in the activity diagram below to determine if the control nodes is the same as the test nodes.
161
-
162
-
The `HtmlDifferenceEngine` class depends on three _strategies_, the `IFilterStrategy`, `IMatcherStrategy`, and `ICompareStrategy` types. These are used in the highlighted activities in the diagram. With those, we can control what nodes and attributes take part in the comparison (filter strategy), how control and test nodes and attributes are matched up for comparison (matching strategy), and finally, how nodes and attributes are determined to be same or different (compare strategy).
163
-
164
-
It starts with a call to the `Compare(INodeList controlNodes, INodeList testNodes)` and recursively calls itself when nodes have child nodes.
165
-
166
-

167
-
168
-
The library comes with a bunch of different filters, matchers, and conparers, that you can configure and mix and match with your own, to get the exact diffing experience you want. See the Usage section above for details.
169
-
170
-
## Creating custom diffing strategies
171
-
172
-
TODO!
25
+
Read more about the available options on the [Diffing Options/Strategies](/docs/Strategies.md) page.
173
26
174
-
### Filters
175
-
- default starting decision is `true`.
176
-
- if a filter receives a source that it does not have an opinion on, it should always return the current decision, whatever it may be.
Big thanks to [Florian Rappl](https://github.com/FlorianRappl) from the AngleSharp team for providing ideas, input and sample code for working with AngleSharp.
180
34
181
35
Another shout-out goes to [XMLUnit](https://www.xmlunit.org). It is a great XML diffing library, and it has been a great inspiration for creating this library.
The heart of the library is the `HtmlDifferenceEngine` class, which goes through the steps illustrated in the activity diagram below to determine if the control nodes is the same as the test nodes.
3
+
4
+
The `HtmlDifferenceEngine` class depends on three _strategies_, the `IFilterStrategy`, `IMatcherStrategy`, and `ICompareStrategy` types. These are used in the highlighted activities in the diagram. With those, we can control what nodes and attributes take part in the comparison (filter strategy), how control and test nodes and attributes are matched up for comparison (matching strategy), and finally, how nodes and attributes are determined to be same or different (compare strategy).
5
+
6
+
It starts with a call to the `Compare(INodeList controlNodes, INodeList testNodes)` and recursively calls itself when nodes have child nodes.
7
+
8
+

9
+
10
+
The library comes with a bunch of different filters, matchers, and comparers, that you can configure and mix and match with your own, to get the exact diffing experience you want. See the [Options/Strategies page](Strategies.md) for details.
0 commit comments