Skip to content

Spanish web content not displayed correctly '?' is putted instead of the correct character #189

@ElliotFer2000

Description

@ElliotFer2000

Spanish words with accents are not properly displayed, char with accents are being replaced with a "?" character

why is this happening? How can I tell the scrapper I'm dealing with the spanish language?

code:

$web = new \Spekulatius\PHPScraper\PHPScraper;

$web->go("https://www.marca.com");

return $web->outlineWithParagraphs;

I return the outline back to the client in json format, the result I'm getting is something like this:

[
    {
        "tag": "h2",
        "content": "Joao F?lix: \"El Bar?a siempre ha sido mi primera opci?n\""
    }
]

I have already tried to solve the problem by putting this at the beggining of the script: setlocale(LC_ALL, 'es_AR')

F?lix and opci?n are not properly displayed in the response, it should be Félix and Opción , ? is being showed instead of é and ó

When I return the result of this function the characters display correctly

utf8_encode(file_get_contents("https://www.marca.com"))

I have tried to request the document with file_get_contents , encode the result and then pass the result to $web->setContent function, I get the expected output working in this way.

            $web = new PHPScraper;
            $rawPageContent = utf8_encode(file_get_contents("https://www.marca.com"));
            $web->setContent("https://www.marca.com",$rawPageContent);

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions