add error handling for crawl4ai markdown generator by RulinShao · Pull Request #33 · rlresearch/dr-tulu

RulinShao · 2025-12-24T10:35:32Z

No description provided.

lolipopshock · 2025-12-24T22:02:09Z

rl/open-instruct/crawl4ai_block_list.txt

can you move the block list to agent/utils/crawl4ai_block_list.txt‎ and soft link it here in the rl repo (to make it compatible with the original repo)?

lolipopshock · 2025-12-24T22:02:49Z

rl/open-instruct/crawl4ai_block_list.txt

+physicsworld.com
+wiley.com
+hindawi.com
+jhu.edu


Ok randomly saw this -- there is a bunch of .edu domain names removed here -- I think this is list is obtained for crawling for pretraining data, but not sure if it is ideal to exclude them here.

lolipopshock · 2025-12-24T22:07:51Z

agent/dr_agent/mcp_backend/apis/crawl4ai_docker_api.py

+            error_str = str(e)
+            # Check if it's a 500 error related to markdown_generator serialization
+            if "500" in error_str or "model_dump" in error_str:
+                print(f"[crawl4ai] markdown_generator caused error, retrying without it: {error_str[:100]}")


Curious if removing markdown generator can actually avoid this issue? (b/c sometimes it's just b/c downloading the original webpage may take a bit longer and then retrying it again after a few seconds can actually fix the issue) But maybe there are outliers?

add error handling for crawl4ai markdown generator

1ad13a6

RulinShao requested a review from lolipopshock December 24, 2025 10:35

add blocklist file

e42c904

lolipopshock reviewed Dec 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add error handling for crawl4ai markdown generator#33

add error handling for crawl4ai markdown generator#33
RulinShao wants to merge 2 commits intomainfrom
fix-web-browsing-markdown-parser

RulinShao commented Dec 24, 2025

Uh oh!

lolipopshock Dec 24, 2025

Uh oh!

lolipopshock Dec 24, 2025

Uh oh!

lolipopshock Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

RulinShao commented Dec 24, 2025

Uh oh!

lolipopshock Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

lolipopshock Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

lolipopshock Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants