-
-
Notifications
You must be signed in to change notification settings - Fork 41
Description
Bug Report
Prerequisites
- Can you reproduce the problem in a MWE?
- Are you running the latest version of AngleSharp?
- Did you check the FAQs to see if that helps you?
- Are you reporting to the correct repository? (there are multiple AngleSharp libraries, e.g.,
AngleSharp.Cssfor CSS support) - Did you perform a search in the issues?
For more information, see the CONTRIBUTING guide.
Description
GetInnerText of the latest AngleSharp.CSS version renders too many line breaks at the start and end of a paragraph
Steps to Reproduce
using AngleSharp;
using AngleSharp.Css;
using AngleSharp.Dom;
using System.Diagnostics;
var content = "<div class=\"entry-content entry-content-single\" itemprop=\"description\"><p><em><strong>[By the studio that brought you <Solo Leveling>, <Reaper of the Drifting Moon>, and many more!]</strong></em></p>\n<p>He was the hound of the Baskerville family: Vikir.</p>\n<p>Yet his loyalty was rewarded by the blade of a guillotine dirtied by slander.</p>\n<p>“I will never live the life of a hound slaughtered after the rabbit is caught.”</p>\n<p>In place of death, an unexpected opportunity awaits him.</p>\n<p>Vikir’s eyes glowed red as he sharpened his canines in the dark.</p>\n<p>“Just you wait, Hugo. I will rip out your throat this time.”</p>\n<p>It’s time for the hound to exact bloody revenge on his owner.</p>\n</div>";
var context = BrowsingContext.New(Configuration.Default
.WithCss());
var doc = await context.OpenAsync(req => req.Content(content));
var description = doc.QuerySelector(("div[itemprop=\"description\"]"))?.GetInnerText().Trim();
Console.WriteLine(description);
Console.ReadKey();
Expected behavior: Outputs
"[By the studio that brought you <Solo Leveling>, <Reaper of the Drifting Moon>, and many more!]\n\nHe was the hound of the Baskerville family: Vikir.\n\nYet his loyalty was rewarded by the blade of a guillotine dirtied by slander.\n\n“I will never live the life of a hound slaughtered after the rabbit is caught.”\n\nIn place of death, an unexpected opportunity awaits him.\n\nVikir’s eyes glowed red as he sharpened his canines in the dark.\n\n“Just you wait, Hugo. I will rip out your throat this time.”\n\nIt’s time for the hound to exact bloody revenge on his owner."
Pasting this code into the browser consoles also outputs the same
document.querySelector('div[itemprop=\"description\"]').innerText
Actual behavior: Outputs
"[By the studio that brought you <Solo Leveling>, <Reaper of the Drifting Moon>, and many more!]\n\n \n\nHe was the hound of the Baskerville family: Vikir.\n\n \n\nYet his loyalty was rewarded by the blade of a guillotine dirtied by slander.\n\n \n\n“I will never live the life of a hound slaughtered after the rabbit is caught.”\n\n \n\nIn place of death, an unexpected opportunity awaits him.\n\n \n\nVikir’s eyes glowed red as he sharpened his canines in the dark.\n\n \n\n“Just you wait, Hugo. I will rip out your throat this time.”\n\n \n\nIt’s time for the hound to exact bloody revenge on his owner."
Environment details: .NET 7
Possible Solution
Adjust the RequiredLineBreakCounts in ElementExtension.cs for the Start from 2 to 1 and for the end from 2 to 1
Adding a IsEmpty check for Text Nodes did fix the issue, it was actually empty "\n" line break Text Nodes between the elements (nodes which aren't visible in the frontend anyway, so they shouldn't be rendered).
The current behaviour is actually adding 2 line breaks before the element and 2 line breaks after the element, but Paragraphs should actually render 2 line breaks in total, 1 at the start and 1 at the end.
