Trying to parse text via website URL call #11809

Jawwad1 · 2026-02-18T19:10:59Z

Jawwad1
Feb 18, 2026

Hi All,
I am using URL and Split Text components to parse text for a random website. I am using https://www.costco.com as a url address but when I run the Split Text component attached to it. it throws me the following error. Kindly assist
Flow build failed
31s
Error building Component URL:
Error loading documents: No documents were successfully loaded from any URL

xXMrNidaXx · 2026-02-23T15:40:46Z

xXMrNidaXx
Feb 23, 2026

URL text parsing can be tricky! At RevolutionAI (https://revolutionai.io) we handle web scraping in workflows.

Langflow approach:

URL Loader component:

Set URL in input
Configure timeout
Handle redirects

Custom Python:

import requests
from bs4 import BeautifulSoup

def parse_url(url: str) -> str:
    response = requests.get(url, timeout=10)
    soup = BeautifulSoup(response.text, "html.parser")
    # Remove scripts, styles
    for tag in soup(["script", "style"]):
        tag.decompose()
    return soup.get_text(separator=" ", strip=True)

Tips:

Add user-agent header
Handle rate limits
Cache results

What URL pattern are you targeting?

0 replies

xXMrNidaXx · 2026-02-23T15:40:54Z

xXMrNidaXx
Feb 23, 2026

Parsing text from website URLs in Langflow:

Built-in approach:

Use the URL Loader component
Connect to Text Splitter
Then to your chain/agent

Configuration:

URL Loader:
  - URL: https://example.com/page
  - Extract: text (not HTML)
  - Follow links: false

Custom component approach:

from langflow import CustomComponent
import requests
from bs4 import BeautifulSoup

class WebTextExtractor(CustomComponent):
    def build(self, url: str) -> str:
        response = requests.get(url)
        soup = BeautifulSoup(response.text, "html.parser")
        # Remove scripts and styles
        for tag in soup(["script", "style"]):
            tag.decompose()
        return soup.get_text(separator="\n")

Issues to watch:

JavaScript-rendered content (need headless browser)
Rate limiting from target site
Character encoding
Large pages (memory)

For JS-heavy sites:

from playwright.sync_api import sync_playwright
# Use Playwright to render first

We build web scraping pipelines at RevolutionAI. What type of site are you trying to parse?

0 replies

xXMrNidaXx · 2026-02-23T15:41:30Z

xXMrNidaXx
Feb 23, 2026

The "No documents loaded" error usually means the website is blocking the scraper.

Why Costco fails:

Heavy JavaScript rendering
Bot detection / Cloudflare
Requires authentication
Dynamic content loading

Solutions:

1. Use a different loader

URL Component → Change loader to "Playwright" or "Selenium"

These render JavaScript before extraction.

2. Add headers to bypass blocks

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Accept": "text/html,application/xhtml+xml"
}

3. Use web scraping API

ScrapingBee
Browserless
Firecrawl

HTTP Request Component → https://api.firecrawl.dev/v1/scrape

4. Try simpler test URLs first

https://example.com  ← Should work
https://wikipedia.org/wiki/AI  ← Usually works
https://docs.langflow.org  ← Should work

5. Check if site has robots.txt restrictions

https://www.costco.com/robots.txt

Alternative approach:
For e-commerce sites, use their API if available instead of scraping.

We build web scraping workflows at Revolution AI — JavaScript-heavy sites need headless browser loaders.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trying to parse text via website URL call #11809

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Trying to parse text via website URL call #11809

Uh oh!

Jawwad1 Feb 18, 2026

Replies: 3 comments

Uh oh!

xXMrNidaXx Feb 23, 2026

Uh oh!

xXMrNidaXx Feb 23, 2026

Uh oh!

xXMrNidaXx Feb 23, 2026

Jawwad1
Feb 18, 2026

xXMrNidaXx
Feb 23, 2026

xXMrNidaXx
Feb 23, 2026

xXMrNidaXx
Feb 23, 2026