You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 14, 2021. It is now read-only.
- Readability.php produces the same exact output as Readability.js
11
+
- I'm happy :)
12
+
13
+
### Fixed
14
+
- Lots of bugs
15
+
- Merged PR by DavidFricker to avoid exceptions while grabbing the document content
16
+
17
+
### Added
18
+
- substituteEntities flag, to avoid replacing especial characters with HTML entities. There's nothing we can do about ` `, that entity is replaced by libxml and there's no way to disable it.
19
+
- Named data sets so it's easier to detect which test case is failing.
20
+
21
+
### Removed
22
+
23
+
- Couple of test cases that involved broken JS. There's nothing we can do about JS spilling onto the text.
Copy file name to clipboardExpand all lines: README.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -51,7 +51,8 @@ If the parsing process was unsuccessful the HTMLParser will return `false`
51
51
-**weightClasses**: default value `true`, weight classes during the rating phase.
52
52
-**removeReadabilityTags**: default value `true`, remove the data-readability tags inside the nodes that are added during the rating phase.
53
53
-**fixRelativeURLs**: default value `false`, convert relative URLs to absolute. Like `/test` to `http://host/test`.
54
-
-**originalURL**: default value `http://fakehost`, original URL from the article used to fix relative URLs.
54
+
-**substituteEntities**: default value `false`, disables the `substituteEntities` flag of libxml. Will avoid substituting HTML entities. Like `´` to á.
55
+
-**originalURL**: default value `http://fakehost`, original URL from the article used to fix relative URLs.
Copy file name to clipboardExpand all lines: test/test-pages/clean-links/expected.html
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
<div><td>
2
2
<h3align="center ">Study Webtext</h3>
3
-
<h2align="center "><fontcolor="Maroon
 " face="Lucida Handwriting ">"Bartleby the Scrivener: A Story of Wall-Street " </font>(1853) <br></br>
3
+
<h2align="center "><spancolor="Maroon
 " face="Lucida Handwriting ">"Bartleby the Scrivener: A Story of Wall-Street " </span>(1853) <br></br>
4
4
Herman Melville</h2>
5
5
<h2align="center "><ahref="http://www.vcu.edu/engweb/webtexts/bartleby.html
 " target="_blank "><imgalign="absmiddle " alt="To the story text without notes
 " height="38 " src="http://fakehost/test/hmhome.gif " width="38 "></img></a>
0 commit comments