Skip to content

Text output from /crawl and /scrape is returned as a single unformatted blob (hard to read / no paragraphs) in Hebrew website text #338

@Gilad123

Description

@Gilad123

When using the Spider API from n8n (HTTP Request node) to extract content from a website, the text returned by both /crawl and /scrape endpoints is coming back as one long, “compressed” string without visible line breaks or paragraph separation, which makes it very hard to read or post‑process.

Title
Text output from /crawl and /scrape is returned as a single unformatted blob (hard to read / no paragraphs)

Issue description
When using the Spider API from n8n (HTTP Request node) to extract content from a website, the text returned by both /crawl and /scrape endpoints is coming back as one long, “compressed” string without visible line breaks or paragraph separation, which makes it very hard to read or post‑process. This behavior is especially noticeable on sites where the page content is in Hebrew.

What I’m doing
Using POST https://api.spider.cloud/crawl to discover URLs and then POST https://api.spider.cloud/scrape to extract the page content.​

Requests are sent as JSON via n8n’s HTTP Request node (v3).

Example /crawl body (simplified):

Since return_format is set to "markdown" and readability is true, I expected the response body to include visible paragraph breaks, headings, and line breaks that reflect the page structure (e.g. \n\n between paragraphs, # for headings, etc.).​

In other words, a reasonably formatted Markdown or plain text representation of the page, suitable for direct reading or passing to an LLM without extra heavy preprocessing.

What actually happens
The content (or equivalent text field) is returned as a single long string, with the text “glued together” and minimal or no visible line breaks.

Even when inspecting the raw JSON (outside of n8n’s UI), the text is effectively one blob, so it’s not just a visualization issue.

Environment
Spider API via HTTPS

n8n version: 1.122.5 (Self‑Hosted)

HTTP Request node v3, Body Content Type = JSON, Using Fields Below

metadata and readability are sent as booleans, not strings

Any guidance or clarification on how to get better‑formatted text from the API would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions