Replies: 2 comments 1 reply
-
|
There are a few likely reasons for getting an empty result when crawling that site. WaterCrawl uses Scrapy with Playwright middleware, so it can handle JavaScript-rendered pages, but empty results can still happen if the site uses strong anti-bot measures, loads content in a way Playwright can't capture, or if there are parsing errors or network issues. The crawler is set up to ignore certain HTTP errors and uses pipelines to filter and process content, so if the page doesn't match expected patterns or is blocked, you might get nothing back. Checking the crawl logs for errors or trying to adjust Playwright/browser settings could help pinpoint the issue. You can see more about the crawler's setup and error handling in the settings. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
-
|
Hi @HaoGuo98, I’ve checked the URL, and it’s currently blocked outside of China. To access it, you’ll need to either use a Chinese HTTP proxy or run the application on servers hosted within China. If you’re using the Watercrawl Cloud paid plan, you’ll automatically have access to our Chinese proxy. Here’s an example result generated using the Chinese proxy: |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
crawl this website( https://www.sac.net.cn/tzgg/202509/t20250919_68417.html ),return empty result.
Beta Was this translation helpful? Give feedback.
All reactions