You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# AngleSharp Diffing - A diff/compare library for AngleSharp
2
2
This library makes it possible to compare a AngleSharp _control_`INodeList` and a _test_`INodeList` and get a list of `IDiff` differences between them.
3
3
4
-
The _control_ nodes represents the expected, i.e. how the nodes are expected to look, and the _test_ nodes represents the other nodes that should be compared to the _control_ nodes.
4
+
The _control_ nodes represents the expected HTML tree, i.e. how the nodes are expected to look, and the _test_ nodes represents the nodes that should be compared to the _control_ nodes.
5
+
6
+
**Differences:** There are three types off `IDiff` differences, that the library can return.
7
+
8
+
-`Diff`/`AttrDiff`: Represents a difference between a control and test node or a control and test attribute.
9
+
-`MissingDiff`/`MissingAttrDiff`: Represents a difference where a control node or control attribute was expected to exist, but was not found in the test nodes tree.
10
+
-`UnexpectedDiff`/`UnexpectedAttrDiff`: Represents a difference where a test node or test attribute was unexpectedly found in the test nodes tree, but did not have a match in the control nodes tree.
5
11
6
12
## Usage
13
+
To find the differences between a control HTML fragment and a test HTML fragment, using the default options, the easiest way is to use the `DiffBuilder` class, like so:
14
+
15
+
```csharp
16
+
varcontrolHtml="<p>Hello World</p>";
17
+
vartestHtml="<p>World, I say hello</p>";
18
+
vardiffs=DiffBuilder
19
+
.Compare(controlHtml)
20
+
.WithTest(testHtml)
21
+
.UseDefaultOptions()
22
+
.Build();
23
+
```
24
+
25
+
The `DiffBuilder` class handles the relative complex task of setting up the `HtmlDifferenceEngine`.
26
+
27
+
Using the `UseDefaultOptions()` method is equivalent to setting the following options explicitly:
7
28
8
29
```csharp
9
30
vardiffs=DiffBuilder
10
31
.Compare(controlHtml)
11
32
.WithTest(testHtml)
33
+
.RemoveComments()
34
+
.Whitespace(WhitespaceOption.Normalize)
35
+
.IgnoreDiffAttributes()
12
36
.Build();
37
+
```
38
+
39
+
## Diffing options/strategies:
40
+
The library comes with a bunch of options (internally referred to as strategies), for the following three main steps in the diffing process:
41
+
42
+
1. Filtering out irrelevant nodes and attributes
43
+
2. Matching up nodes and attributes for comparison
44
+
3. Comparing matched up nodes and attributes
45
+
46
+
The following section document the current built-in strategies that are available. A later second will describe how to built your own strategies, to get very tight control of the diffing process.
13
47
48
+
### Remove comments
49
+
Enabling this strategy will remove all comment nodes from the comparison. Activate by calling the `RemoveComments()` method on a `DiffBuilder` instance, e.g.:
50
+
51
+
```csharp
52
+
vardiffs=DiffBuilder
53
+
.Compare(controlHtml)
54
+
.WithTest(testHtml)
55
+
.RemoveComments()
56
+
.Build();
14
57
```
15
58
16
-
#### Built-in filters:
17
-
-**`RemoveCommentNodeFilter`**: remove all comment nodes.
18
-
-** whitespace only text nodes
59
+
_NOTE: Currently, the remove comment strategy does NOT remove comments from CSS or JavaScript embedded in `<style>` or `<script>` tags.__
60
+
61
+
### Whitespace handling
62
+
Whitespace can be a source of false-positives when comparing two HTML fragments. Thus, the whitespace handling strategy offer different ways to deal with it during a comparison.
63
+
64
+
-`Preserve`: Does not change or filter out any whitespace in control and test HTML. Default, same as not specifying any options.
65
+
-`RemoveWhitespaceNodes`: Using this option filters out all text nodes that only consist of whitespace characters.
66
+
-`Normalize`: Using this option will _trim_ all text nodes and replace two or more whitespace characters with a single space character.
67
+
68
+
These options can be set either _globally_ for the entire comparison, or on a _specific subtrees in the comparison_.
19
69
20
-
Matchers:
21
-
- searching matcher, that will match nodes of the same type, and, optionally, element with the same element name.
22
-
- css selector matcher nodes and for attributes
70
+
To set a global default, call the method `Whitespace(WhitespaceOption)` on a `DiffBuilder` instance, e.g.:
23
71
24
-
Comparers:
25
-
-**`DiffIgnoreAttributeComparer`**: allows you to specify an special attribute `diff:ignore="true"` (`="true"` optional) on control elements to ignore them, their attributes, and child nodes, during comparison. E.g. `<p diff:ignore>...</p>`.
26
-
- ignore consecutive whitespace comparer inside textnodes (not in strings in script and style tags).
27
-
- regex comparer
28
-
- ignore case comparer (attr/text)
72
+
```csharp
73
+
vardiffs=DiffBuilder
74
+
.Compare(controlHtml)
75
+
.WithTest(testHtml)
76
+
.Whitespace(WhitespaceOption.Normalize)
77
+
.Build();
78
+
```
79
+
80
+
To configure/override whitespace rules on a specific subtree in the comparison, use the `diff:whitespace="WhitespaceOption"` on a control node, and it and all nodes below it will use that whitespace option, unless it is overridden on a child node. In the example below, all whitespace inside the `<h1>` element is preserved:
**Special case for `<pre>`-tags:** The content of `<pre />` tags will always be treated as the `Preserve` option, even if whitespace strategy is globally set to `RemoveWhitespaceNodes` or `Normalize`. To override this, add a local `diff:whitespace" attribute to the tag, e.g.:
**Special case for `<style>`-tags:** Even if the whitespace option is `Normalize`, whitespace inside quotes (`"` and `'` style quotes) is preserved as is. For example, the text inside the `content` style information in the following CSS will not be normalized: `p::after { content: " -.- "; }`.
95
+
96
+
**Special case for `<script>`-tags:** It is on the issues list to deal with whitespace properly inside `<script>`-tags.
97
+
98
+
### Ignore attribute
99
+
If the `diff:ignore="true"` attribute is used on a control element (`="true"` implicit/optional), all their attributes and child nodes are skipped/ignored during comparison, including those of the test element, the control element is matched with.
100
+
101
+
In this example, the `<h1>` tag, it's attribute and children are considered the same as the element it is matched with:
102
+
103
+
```html
104
+
<header>
105
+
<h1class="heading-1"diff:ignore>Hello world</h1>
106
+
</header>
107
+
```
108
+
109
+
To only ignore a specific attribute during comparison, add the `:ignore` to the attribute in the control HTML. That will consider the control and test attribute the same. E.g. to ignore the `class` attribute, do:
110
+
111
+
```html
112
+
<header>
113
+
<h1class:ignore="heading-1">Hello world</h1>
114
+
</header>
115
+
```
116
+
117
+
Activate this strategy by calling the `EnableIgnoreAttribute()` method on a `DiffBuilder` instance, e.g.:
118
+
119
+
```csharp
120
+
vardiffs=DiffBuilder
121
+
.Compare(controlHtml)
122
+
.WithTest(testHtml)
123
+
.EnableIgnoreAttribute()
124
+
.Build();
125
+
```
126
+
127
+
### Matching options
128
+
129
+
#### One-to-one matcher (node, attr)
130
+
131
+
#### Forward-searching matcher (node)
132
+
133
+
#### CSS selector-cross tree matcher (node, attr)
134
+
135
+
### Compare options
136
+
137
+
#### Name/Type matcher (node, attr)
138
+
#### Content matcher (text, attr)
139
+
#### Content regex matcher (text, attr)
140
+
#### IgnoreCase content matcher (text, attr)
141
+
142
+
### Ignoring special `diff:` attributes
143
+
Any attributes that starts with `diff:` are automatically filtered out before matching/comparing happens. E.g. `diff:whitespace="..."` does not show up as a missing diff when added to an control element.
144
+
145
+
To enable this option, use the `IgnoreDiffAttributes()` method on the `DiffBuilder` class, e.g.:
146
+
147
+
```csharp
148
+
vardiffs=DiffBuilder
149
+
.Compare(controlHtml)
150
+
.WithTest(testHtml)
151
+
.IgnoreDiffAttributes()
152
+
.Build();
153
+
```
29
154
30
155
## Difference engine details
31
156
The heart of the library is the `HtmlDifferenceEngine` class, which goes through the steps illustrated in the activity diagram below to determine if the control nodes is the same as the test nodes.
@@ -40,6 +165,12 @@ The library comes with a bunch of different filters, matchers, and conparers, th
40
165
41
166
## Creating custom diffing strategies
42
167
168
+
TODO!
169
+
170
+
### Filters
171
+
- default starting decision is `true`.
172
+
- if a filter receives a source that it does not have an opinion on, it should always return the current decision, whatever it may be.
173
+
43
174
## Acknowledgement
44
175
Big thanks to [Florian Rappl](https://github.com/FlorianRappl) from the AngleSharp team for providing ideas, input and sample code for working with AngleSharp.
0 commit comments