Skip to content

Commit c114bc1

Browse files
committed
Added more docs. Ignore diff: attr filter added
1 parent 8c4949e commit c114bc1

File tree

6 files changed

+235
-15
lines changed

6 files changed

+235
-15
lines changed

README.md

Lines changed: 143 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,156 @@
11
# AngleSharp Diffing - A diff/compare library for AngleSharp
22
This library makes it possible to compare a AngleSharp _control_ `INodeList` and a _test_ `INodeList` and get a list of `IDiff` differences between them.
33

4-
The _control_ nodes represents the expected, i.e. how the nodes are expected to look, and the _test_ nodes represents the other nodes that should be compared to the _control_ nodes.
4+
The _control_ nodes represents the expected HTML tree, i.e. how the nodes are expected to look, and the _test_ nodes represents the nodes that should be compared to the _control_ nodes.
5+
6+
**Differences:** There are three types off `IDiff` differences, that the library can return.
7+
8+
- `Diff`/`AttrDiff`: Represents a difference between a control and test node or a control and test attribute.
9+
- `MissingDiff`/`MissingAttrDiff`: Represents a difference where a control node or control attribute was expected to exist, but was not found in the test nodes tree.
10+
- `UnexpectedDiff`/`UnexpectedAttrDiff`: Represents a difference where a test node or test attribute was unexpectedly found in the test nodes tree, but did not have a match in the control nodes tree.
511

612
## Usage
13+
To find the differences between a control HTML fragment and a test HTML fragment, using the default options, the easiest way is to use the `DiffBuilder` class, like so:
14+
15+
```csharp
16+
var controlHtml = "<p>Hello World</p>";
17+
var testHtml = "<p>World, I say hello</p>";
18+
var diffs = DiffBuilder
19+
.Compare(controlHtml)
20+
.WithTest(testHtml)
21+
.UseDefaultOptions()
22+
.Build();
23+
```
24+
25+
The `DiffBuilder` class handles the relative complex task of setting up the `HtmlDifferenceEngine`.
26+
27+
Using the `UseDefaultOptions()` method is equivalent to setting the following options explicitly:
728

829
```csharp
930
var diffs = DiffBuilder
1031
.Compare(controlHtml)
1132
.WithTest(testHtml)
33+
.RemoveComments()
34+
.Whitespace(WhitespaceOption.Normalize)
35+
.IgnoreDiffAttributes()
1236
.Build();
37+
```
38+
39+
## Diffing options/strategies:
40+
The library comes with a bunch of options (internally referred to as strategies), for the following three main steps in the diffing process:
41+
42+
1. Filtering out irrelevant nodes and attributes
43+
2. Matching up nodes and attributes for comparison
44+
3. Comparing matched up nodes and attributes
45+
46+
The following section document the current built-in strategies that are available. A later second will describe how to built your own strategies, to get very tight control of the diffing process.
1347

48+
### Remove comments
49+
Enabling this strategy will remove all comment nodes from the comparison. Activate by calling the `RemoveComments()` method on a `DiffBuilder` instance, e.g.:
50+
51+
```csharp
52+
var diffs = DiffBuilder
53+
.Compare(controlHtml)
54+
.WithTest(testHtml)
55+
.RemoveComments()
56+
.Build();
1457
```
1558

16-
#### Built-in filters:
17-
- **`RemoveCommentNodeFilter`**: remove all comment nodes.
18-
- ** whitespace only text nodes
59+
_NOTE: Currently, the remove comment strategy does NOT remove comments from CSS or JavaScript embedded in `<style>` or `<script>` tags.__
60+
61+
### Whitespace handling
62+
Whitespace can be a source of false-positives when comparing two HTML fragments. Thus, the whitespace handling strategy offer different ways to deal with it during a comparison.
63+
64+
- `Preserve`: Does not change or filter out any whitespace in control and test HTML. Default, same as not specifying any options.
65+
- `RemoveWhitespaceNodes`: Using this option filters out all text nodes that only consist of whitespace characters.
66+
- `Normalize`: Using this option will _trim_ all text nodes and replace two or more whitespace characters with a single space character.
67+
68+
These options can be set either _globally_ for the entire comparison, or on a _specific subtrees in the comparison_.
1969

20-
Matchers:
21-
- searching matcher, that will match nodes of the same type, and, optionally, element with the same element name.
22-
- css selector matcher nodes and for attributes
70+
To set a global default, call the method `Whitespace(WhitespaceOption)` on a `DiffBuilder` instance, e.g.:
2371

24-
Comparers:
25-
- **`DiffIgnoreAttributeComparer`**: allows you to specify an special attribute `diff:ignore="true"` (`="true"` optional) on control elements to ignore them, their attributes, and child nodes, during comparison. E.g. `<p diff:ignore>...</p>`.
26-
- ignore consecutive whitespace comparer inside textnodes (not in strings in script and style tags).
27-
- regex comparer
28-
- ignore case comparer (attr/text)
72+
```csharp
73+
var diffs = DiffBuilder
74+
.Compare(controlHtml)
75+
.WithTest(testHtml)
76+
.Whitespace(WhitespaceOption.Normalize)
77+
.Build();
78+
```
79+
80+
To configure/override whitespace rules on a specific subtree in the comparison, use the `diff:whitespace="WhitespaceOption"` on a control node, and it and all nodes below it will use that whitespace option, unless it is overridden on a child node. In the example below, all whitespace inside the `<h1>` element is preserved:
81+
82+
```html
83+
<header>
84+
<h1 diff:whitespace="Preserve">Hello <em> woooorld</em></h1>
85+
</header>
86+
```
87+
88+
**Special case for `<pre>`-tags:** The content of `<pre />` tags will always be treated as the `Preserve` option, even if whitespace strategy is globally set to `RemoveWhitespaceNodes` or `Normalize`. To override this, add a local `diff:whitespace" attribute to the tag, e.g.:
89+
90+
```html
91+
<pre diff:whitespace="RemoveWhitespaceNodes">...</pre>
92+
```
93+
94+
**Special case for `<style>`-tags:** Even if the whitespace option is `Normalize`, whitespace inside quotes (`"` and `'` style quotes) is preserved as is. For example, the text inside the `content` style information in the following CSS will not be normalized: `p::after { content: " -.- "; }`.
95+
96+
**Special case for `<script>`-tags:** It is on the issues list to deal with whitespace properly inside `<script>`-tags.
97+
98+
### Ignore attribute
99+
If the `diff:ignore="true"` attribute is used on a control element (`="true"` implicit/optional), all their attributes and child nodes are skipped/ignored during comparison, including those of the test element, the control element is matched with.
100+
101+
In this example, the `<h1>` tag, it's attribute and children are considered the same as the element it is matched with:
102+
103+
```html
104+
<header>
105+
<h1 class="heading-1" diff:ignore>Hello world</h1>
106+
</header>
107+
```
108+
109+
To only ignore a specific attribute during comparison, add the `:ignore` to the attribute in the control HTML. That will consider the control and test attribute the same. E.g. to ignore the `class` attribute, do:
110+
111+
```html
112+
<header>
113+
<h1 class:ignore="heading-1">Hello world</h1>
114+
</header>
115+
```
116+
117+
Activate this strategy by calling the `EnableIgnoreAttribute()` method on a `DiffBuilder` instance, e.g.:
118+
119+
```csharp
120+
var diffs = DiffBuilder
121+
.Compare(controlHtml)
122+
.WithTest(testHtml)
123+
.EnableIgnoreAttribute()
124+
.Build();
125+
```
126+
127+
### Matching options
128+
129+
#### One-to-one matcher (node, attr)
130+
131+
#### Forward-searching matcher (node)
132+
133+
#### CSS selector-cross tree matcher (node, attr)
134+
135+
### Compare options
136+
137+
#### Name/Type matcher (node, attr)
138+
#### Content matcher (text, attr)
139+
#### Content regex matcher (text, attr)
140+
#### IgnoreCase content matcher (text, attr)
141+
142+
### Ignoring special `diff:` attributes
143+
Any attributes that starts with `diff:` are automatically filtered out before matching/comparing happens. E.g. `diff:whitespace="..."` does not show up as a missing diff when added to an control element.
144+
145+
To enable this option, use the `IgnoreDiffAttributes()` method on the `DiffBuilder` class, e.g.:
146+
147+
```csharp
148+
var diffs = DiffBuilder
149+
.Compare(controlHtml)
150+
.WithTest(testHtml)
151+
.IgnoreDiffAttributes()
152+
.Build();
153+
```
29154

30155
## Difference engine details
31156
The heart of the library is the `HtmlDifferenceEngine` class, which goes through the steps illustrated in the activity diagram below to determine if the control nodes is the same as the test nodes.
@@ -40,6 +165,12 @@ The library comes with a bunch of different filters, matchers, and conparers, th
40165

41166
## Creating custom diffing strategies
42167

168+
TODO!
169+
170+
### Filters
171+
- default starting decision is `true`.
172+
- if a filter receives a source that it does not have an opinion on, it should always return the current decision, whatever it may be.
173+
43174
## Acknowledgement
44175
Big thanks to [Florian Rappl](https://github.com/FlorianRappl) from the AngleSharp team for providing ideas, input and sample code for working with AngleSharp.
45176

src/DiffBuilder.cs

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,13 @@ public static DiffBuilder Compare(string control)
2828
{
2929
return new DiffBuilder(control);
3030
}
31+
32+
public DiffBuilder WithFilter(FilterStrategy<ComparisonSource> filterStrategy)
33+
{
34+
return this;
35+
}
3136

32-
public DiffBuilder WithFilter(Func<ComparisonSource, bool> nodeFilter)
37+
public DiffBuilder WithFilter(FilterStrategy<AttributeComparisonSource> filterStrategy)
3338
{
3439
return this;
3540
}

src/DiffingStrategyPipeline.cs

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,13 +25,11 @@ public class DiffingStrategyPipeline : IFilterStrategy, IMatcherStrategy, ICompa
2525

2626
public IEnumerable<Comparison> Match(DiffContext context, SourceCollection controlSources, SourceCollection testSources)
2727
=> Match(context, controlSources, testSources, _nodeMatchers);
28-
2928
public IEnumerable<AttributeComparison> Match(DiffContext context, SourceMap controlAttrSources, SourceMap testAttrSources)
3029
=> Match(context, controlAttrSources, testAttrSources, _attrsMatchers);
3130

3231
public CompareResult Compare(in Comparison comparison)
3332
=> Compare(comparison, _nodeComparers, CompareResult.DifferentAndBreak);
34-
3533
public CompareResult Compare(in AttributeComparison comparison)
3634
=> Compare(comparison, _attrComparers, CompareResult.Different);
3735

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
using System;
2+
using System.Collections.Generic;
3+
using System.Linq;
4+
using System.Text;
5+
using System.Threading.Tasks;
6+
using Egil.AngleSharp.Diffing.Core;
7+
8+
namespace Egil.AngleSharp.Diffing.Strategies
9+
{
10+
public static class IgnoreDiffAttributesFilter
11+
{
12+
private const string DiffAttributePrefix = "diff:";
13+
14+
public static bool Filter(in AttributeComparisonSource source, bool currentDecision)
15+
{
16+
if (!currentDecision) return currentDecision;
17+
18+
if (source.Attribute.Name.StartsWith(DiffAttributePrefix, StringComparison.OrdinalIgnoreCase))
19+
return false;
20+
21+
return currentDecision;
22+
}
23+
24+
public static DiffBuilder IgnoreDiffAttributes(this DiffBuilder builder)
25+
{
26+
if (builder is null) throw new ArgumentNullException(nameof(builder));
27+
builder.WithFilter(Filter);
28+
return builder;
29+
}
30+
}
31+
}
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
using System;
2+
using System.Collections.Generic;
3+
using System.Linq;
4+
using System.Text;
5+
using System.Threading.Tasks;
6+
using Xunit;
7+
8+
namespace Egil.AngleSharp.Diffing.Strategies
9+
{
10+
public class IgnoreAttributeTest
11+
{
12+
// When a control element with diff:ignore not matched, it does not count as a missing diff
13+
// When a control attribute with :ignore postfix is not matched, it does not count as a missing attr diff
14+
}
15+
}
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
using System;
2+
using System.Collections.Generic;
3+
using System.Linq;
4+
using System.Text;
5+
using System.Threading.Tasks;
6+
using AngleSharp.Dom;
7+
using Egil.AngleSharp.Diffing.Core;
8+
using Shouldly;
9+
using Xunit;
10+
11+
namespace Egil.AngleSharp.Diffing.Strategies
12+
{
13+
public class IgnoreDiffAttributesFilterTest : DiffingTestBase
14+
{
15+
[Theory(DisplayName = "When an attribute starts with 'diff:' it is filtered out")]
16+
[InlineData(@"<p diff:whitespace=""Normalize"">", "diff:whitespace")]
17+
[InlineData(@"<p diff:ignore=""true"">", "diff:ignore")]
18+
public void Test1(string elementHtml, string diffAttrName)
19+
{
20+
var elmSource = ToComparisonSource(elementHtml);
21+
var attr = ((IElement)elmSource.Node).Attributes[diffAttrName];
22+
var source = new AttributeComparisonSource(attr, elmSource);
23+
24+
IgnoreDiffAttributesFilter.Filter(source, true).ShouldBeFalse();
25+
}
26+
27+
[Theory(DisplayName = "When an attribute does not starts with 'diff:' its current decision is used")]
28+
[InlineData(@"<p lang=""csharp"">", "lang")]
29+
[InlineData(@"<p diff=""foo"">", "diff")]
30+
[InlineData(@"<p diffx=""foo"">", "diffx")]
31+
public void Test2(string elementHtml, string diffAttrName)
32+
{
33+
var elmSource = ToComparisonSource(elementHtml);
34+
var attr = ((IElement)elmSource.Node).Attributes[diffAttrName];
35+
var source = new AttributeComparisonSource(attr, elmSource);
36+
37+
IgnoreDiffAttributesFilter.Filter(source, true).ShouldBeTrue();
38+
}
39+
}
40+
}

0 commit comments

Comments
 (0)