Skip to content

Commit afa1a27

Browse files
committed
Merge branch 'dev'
2 parents 2aa32ed + e2c88f2 commit afa1a27

File tree

93 files changed

+4191
-639
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

93 files changed

+4191
-639
lines changed

AngleSharp.Diffing.sln

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,18 @@ Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Solution Items", "Solution
99
ProjectSection(SolutionItems) = preProject
1010
Directory.Build.props = Directory.Build.props
1111
LICENSE = LICENSE
12-
README.md = README.md
1312
EndProjectSection
1413
EndProject
1514
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Egil.AngleSharp.DiffingTests", "tests\Egil.AngleSharp.DiffingTests.csproj", "{18203D98-66B4-4736-B79A-3B7D02EFA9E8}"
1615
EndProject
16+
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "docs", "docs", "{091B9704-B116-4BAA-8393-401ADAF7A818}"
17+
ProjectSection(SolutionItems) = preProject
18+
docs\CustomStrategies.md = docs\CustomStrategies.md
19+
docs\DiffingEngineInternals.md = docs\DiffingEngineInternals.md
20+
README.md = README.md
21+
docs\Strategies.md = docs\Strategies.md
22+
EndProjectSection
23+
EndProject
1724
Global
1825
GlobalSection(SolutionConfigurationPlatforms) = preSolution
1926
Debug|Any CPU = Debug|Any CPU

AngleSharp.Diffing.v3.ncrunchsolution

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
<SolutionConfiguration>
2+
<Settings>
3+
<AllowParallelTestExecution>True</AllowParallelTestExecution>
4+
<SolutionConfigured>True</SolutionConfigured>
5+
</Settings>
6+
</SolutionConfiguration>

Directory.Build.props

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,10 @@
22
<PropertyGroup>
33
<LangVersion>8.0</LangVersion>
44
<Nullable>enable</Nullable>
5+
<WarningsAsErrors>CS8600;CS8602;CS8603;CS8625</WarningsAsErrors>
56
</PropertyGroup>
67
<PropertyGroup>
78
<AngleSharpVersion>0.13.0</AngleSharpVersion>
8-
<FxCopAnalyzersVersion>2.9.4</FxCopAnalyzersVersion>
9+
<FxCopAnalyzersVersion>2.9.6</FxCopAnalyzersVersion>
910
</PropertyGroup>
1011
</Project>

README.md

Lines changed: 8 additions & 154 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,4 @@
11
# AngleSharp Diffing - A diff/compare library for AngleSharp
2-
[![Build and test status](https://github.com/egil/AngleSharp.Diffing/workflows/CI/badge.svg)](https://github.com/egil/AngleSharp.Diffing/actions?workflow=CI)
3-
42
This library makes it possible to compare a AngleSharp _control_ `INodeList` and a _test_ `INodeList` and get a list of `IDiff` differences between them.
53

64
The _control_ nodes represents the expected HTML tree, i.e. how the nodes are expected to look, and the _test_ nodes represents the nodes that should be compared to the _control_ nodes.
@@ -11,7 +9,7 @@ The _control_ nodes represents the expected HTML tree, i.e. how the nodes are ex
119
- `MissingDiff`/`MissingAttrDiff`: Represents a difference where a control node or control attribute was expected to exist, but was not found in the test nodes tree.
1210
- `UnexpectedDiff`/`UnexpectedAttrDiff`: Represents a difference where a test node or test attribute was unexpectedly found in the test nodes tree, but did not have a match in the control nodes tree.
1311

14-
## Usage
12+
# Usage
1513
To find the differences between a control HTML fragment and a test HTML fragment, using the default options, the easiest way is to use the `DiffBuilder` class, like so:
1614

1715
```csharp
@@ -20,162 +18,18 @@ var testHtml = "<p>World, I say hello</p>";
2018
var diffs = DiffBuilder
2119
.Compare(controlHtml)
2220
.WithTest(testHtml)
23-
.UseDefaultOptions()
24-
.Build();
25-
```
26-
27-
The `DiffBuilder` class handles the relative complex task of setting up the `HtmlDifferenceEngine`.
28-
29-
Using the `UseDefaultOptions()` method is equivalent to setting the following options explicitly:
30-
31-
```csharp
32-
var diffs = DiffBuilder
33-
.Compare(controlHtml)
34-
.WithTest(testHtml)
35-
.RemoveComments()
36-
.Whitespace(WhitespaceOption.Normalize)
37-
.IgnoreDiffAttributes()
21+
.WithDefaultOptions()
3822
.Build();
39-
```
40-
41-
See more about what each option does in the following sections.
42-
43-
## Diffing options/strategies:
44-
The library comes with a bunch of options (internally referred to as strategies), for the following three main steps in the diffing process:
45-
46-
1. Filtering out irrelevant nodes and attributes
47-
2. Matching up nodes and attributes for comparison
48-
3. Comparing matched up nodes and attributes
49-
50-
The following section documents the current built-in strategies that are available. A later second will describe how to built your own strategies, to get very tight control of the diffing process.
51-
52-
### Remove comments
53-
Enabling this strategy will remove all comment nodes from the comparison. Activate by calling the `RemoveComments()` method on a `DiffBuilder` instance, e.g.:
54-
55-
```csharp
56-
var diffs = DiffBuilder
57-
.Compare(controlHtml)
58-
.WithTest(testHtml)
59-
.RemoveComments()
60-
.Build();
61-
```
62-
63-
_NOTE: Currently, the remove comment strategy does NOT remove comments from CSS or JavaScript embedded in `<style>` or `<script>` tags._
64-
65-
### Whitespace handling
66-
Whitespace can be a source of false-positives when comparing two HTML fragments. Thus, the whitespace handling strategy offer different ways to deal with it during a comparison.
67-
68-
- `Preserve`: Does not change or filter out any whitespace in control and test HTML. Default, same as not specifying any options.
69-
- `RemoveWhitespaceNodes`: Using this option filters out all text nodes that only consist of whitespace characters.
70-
- `Normalize`: Using this option will _trim_ all text nodes and replace two or more whitespace characters with a single space character.
71-
72-
These options can be set either _globally_ for the entire comparison, or on a _specific subtrees in the comparison_.
73-
74-
To set a global default, call the method `Whitespace(WhitespaceOption)` on a `DiffBuilder` instance, e.g.:
75-
76-
```csharp
77-
var diffs = DiffBuilder
78-
.Compare(controlHtml)
79-
.WithTest(testHtml)
80-
.Whitespace(WhitespaceOption.Normalize)
81-
.Build();
82-
```
83-
84-
To configure/override whitespace rules on a specific subtree in the comparison, use the `diff:whitespace="WhitespaceOption"` on a control node, and it and all nodes below it will use that whitespace option, unless it is overridden on a child node. In the example below, all whitespace inside the `<h1>` element is preserved:
85-
86-
```html
87-
<header>
88-
<h1 diff:whitespace="Preserve">Hello <em> woooorld</em></h1>
89-
</header>
90-
```
91-
92-
**Special case for `<pre>`-tags:** The content of `<pre />` tags will always be treated as the `Preserve` option, even if whitespace strategy is globally set to `RemoveWhitespaceNodes` or `Normalize`. To override this, add a local `diff:whitespace" attribute to the tag, e.g.:
93-
94-
```html
95-
<pre diff:whitespace="RemoveWhitespaceNodes">...</pre>
9623
```
9724

98-
**Special case for `<style>`-tags:** It is on the TODO list to handle string in CSS more intelligently: Even if the whitespace option is `Normalize`, whitespace inside quotes (`"` and `'` style quotes) is preserved as is. For example, the text inside the `content` style information in the following CSS will not be normalized: `p::after { content: " -.- "; }`.
99-
100-
**Special case for `<script>`-tags:** It is on the TODO list to deal with whitespace properly inside `<script>`-tags.
101-
102-
### Ignore attribute
103-
If the `diff:ignore="true"` attribute is used on a control element (`="true"` implicit/optional), all their attributes and child nodes are skipped/ignored during comparison, including those of the test element, the control element is matched with.
104-
105-
In this example, the `<h1>` tag, it's attribute and children are considered the same as the element it is matched with:
106-
107-
```html
108-
<header>
109-
<h1 class="heading-1" diff:ignore>Hello world</h1>
110-
</header>
111-
```
112-
113-
To only ignore a specific attribute during comparison, add the `:ignore` to the attribute in the control HTML. That will consider the control and test attribute the same. E.g. to ignore the `class` attribute, do:
114-
115-
```html
116-
<header>
117-
<h1 class:ignore="heading-1">Hello world</h1>
118-
</header>
119-
```
120-
121-
Activate this strategy by calling the `EnableIgnoreAttribute()` method on a `DiffBuilder` instance, e.g.:
122-
123-
```csharp
124-
var diffs = DiffBuilder
125-
.Compare(controlHtml)
126-
.WithTest(testHtml)
127-
.EnableIgnoreAttribute()
128-
.Build();
129-
```
130-
131-
### Matching options
132-
133-
#### One-to-one matcher (node, attr)
134-
135-
#### Forward-searching matcher (node)
136-
137-
#### CSS selector-cross tree matcher (node, attr)
138-
139-
### Compare options
140-
141-
#### Name/Type matcher (node, attr)
142-
#### Content matcher (text, attr)
143-
#### Content regex matcher (text, attr)
144-
#### IgnoreCase content matcher (text, attr)
145-
146-
### Ignoring special `diff:` attributes
147-
Any attributes that starts with `diff:` are automatically filtered out before matching/comparing happens. E.g. `diff:whitespace="..."` does not show up as a missing diff when added to an control element.
148-
149-
To enable this option, use the `IgnoreDiffAttributes()` method on the `DiffBuilder` class, e.g.:
150-
151-
```csharp
152-
var diffs = DiffBuilder
153-
.Compare(controlHtml)
154-
.WithTest(testHtml)
155-
.IgnoreDiffAttributes()
156-
.Build();
157-
```
158-
159-
## Difference engine details
160-
The heart of the library is the `HtmlDifferenceEngine` class, which goes through the steps illustrated in the activity diagram below to determine if the control nodes is the same as the test nodes.
161-
162-
The `HtmlDifferenceEngine` class depends on three _strategies_, the `IFilterStrategy`, `IMatcherStrategy`, and `ICompareStrategy` types. These are used in the highlighted activities in the diagram. With those, we can control what nodes and attributes take part in the comparison (filter strategy), how control and test nodes and attributes are matched up for comparison (matching strategy), and finally, how nodes and attributes are determined to be same or different (compare strategy).
163-
164-
It starts with a call to the `Compare(INodeList controlNodes, INodeList testNodes)` and recursively calls itself when nodes have child nodes.
165-
166-
![Activity diagram that shows the comparing processing in HtmlDifferenceEngine](docs/HtmlDifferenceEngineFlow.svg)
167-
168-
The library comes with a bunch of different filters, matchers, and conparers, that you can configure and mix and match with your own, to get the exact diffing experience you want. See the Usage section above for details.
169-
170-
## Creating custom diffing strategies
171-
172-
TODO!
25+
Read more about the available options on the [Diffing Options/Strategies](/docs/Strategies.md) page.
17326

174-
### Filters
175-
- default starting decision is `true`.
176-
- if a filter receives a source that it does not have an opinion on, it should always return the current decision, whatever it may be.
27+
# Documentation
28+
- [Diffing Options/Strategies](/docs/Strategies.md)
29+
- [Creating custom diffing options/strategies](/docs/CustomStrategies.md)
30+
- [Difference engine internals](/docs/DifferenceEngineInternals.md)
17731

178-
## Acknowledgement
32+
## Acknowledgments
17933
Big thanks to [Florian Rappl](https://github.com/FlorianRappl) from the AngleSharp team for providing ideas, input and sample code for working with AngleSharp.
18034

18135
Another shout-out goes to [XMLUnit](https://www.xmlunit.org). It is a great XML diffing library, and it has been a great inspiration for creating this library.

docs/CustomStrategies.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Creating custom diffing strategies
2+
3+
TODO!
4+
5+
### Filters
6+
- default starting decision is `true`.
7+
- if a filter receives a source that it does not have an opinion on, it should always return the current decision, whatever it may be.

docs/DiffingEngineInternals.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Difference engine internals
2+
The heart of the library is the `HtmlDifferenceEngine` class, which goes through the steps illustrated in the activity diagram below to determine if the control nodes is the same as the test nodes.
3+
4+
The `HtmlDifferenceEngine` class depends on three _strategies_, the `IFilterStrategy`, `IMatcherStrategy`, and `ICompareStrategy` types. These are used in the highlighted activities in the diagram. With those, we can control what nodes and attributes take part in the comparison (filter strategy), how control and test nodes and attributes are matched up for comparison (matching strategy), and finally, how nodes and attributes are determined to be same or different (compare strategy).
5+
6+
It starts with a call to the `Compare(INodeList controlNodes, INodeList testNodes)` and recursively calls itself when nodes have child nodes.
7+
8+
![img](HtmlDifferenceEngineFlow.svg)
9+
10+
The library comes with a bunch of different filters, matchers, and comparers, that you can configure and mix and match with your own, to get the exact diffing experience you want. See the [Options/Strategies page](Strategies.md) for details.

0 commit comments

Comments
 (0)