Hello,
First of all, congrats for the code!
On some content, HTML code is infiltrating in extracted textual content. Example for this article:
https://www.yahoo.com/entertainment/chris-evans-ignites-celeb-civil-085545038.html
Snippet of the extracted content:
Chris Evans causing people to choose sides in a way that hasn’t been seen since “Captain America: Civil War.”” data-reactid=”16″ type=”text”>There’s an issue that’s divided Twitter almost as much as anything in politics this week, with Chris Evans causing people to choose sides in a way that hasn’t been seen since “Captain America: Civil War.”
As you can see, this is infiltrating in content:
”” data-reactid=”16″ type=”text”>
This is mu current code:
$readConf = new Configuration(); $readConf->setSummonCthulhu(true); $readability = new Readability($readConf); $readability->parse($html_string); $return_me = $readability->getContent();
Any help is appreciated.
Regards,
Szabi.