Feature/add fetch first approach #6683

QuinsZouls · 2025-08-04T19:58:40Z

Related GitHub Issue

Closes: #

Roo Code Task Context (Optional)

Description

This PR introduces several improvements to the URL content fetching mechanism, making it more robust and intelligent.
Key Changes:

Smarter Content Fetching:
- A new analyzeWebsite.ts module has been added. This module analyzes the initial HTML of a URL to determine if it's a static site or a single-page
  application (SPA) that requires JavaScript to render its content.
- Based on this analysis, the UrlContentFetcher.ts now decides whether to use a simple fetch or a full browser rendering with Puppeteer. This avoids
  unnecessary browser launching for static sites, improving performance and resource usage.
Improved Fetching Robustness:
- The UrlContentFetcher.ts now includes a retry mechanism (MAX_FETCH_RETRIES) for fetching URL content, making it more resilient to transient network errors.
- A standardized USER_AGENT is now used for all fetch requests.
- Fallback logic for Puppeteer has been refined. If loading a page with networkidle2 fails, it will retry with domcontentloaded for better success rates on
  slow-loading pages.

Impact:

These changes lead to a more efficient, reliable, and intelligent URL content fetching service. By avoiding Puppeteer for static sites, the system will be faster and consume fewer resources.

Test Procedure

Pre-Submission Checklist

Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
Scope: My changes are focused on the linked issue (one major feature/fix per PR).
Self-Review: I have performed a thorough self-review of my code.
Testing: New and/or updated tests have been added to cover my changes (if applicable).
Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

Documentation Updates

Additional Notes

Get in Touch

…configuration

roomote

Thank you for your contribution! I've reviewed the changes and found several issues that need attention before this can be merged.

roomote · 2025-08-04T20:02:17Z

src/services/browser/UrlContentFetcher.ts

+				}
+			}
+		}
+		return ""


The return statement here is unreachable since the loop either returns a value or throws an error. Consider removing this line:

roomote · 2025-08-04T20:02:17Z

src/services/browser/UrlContentFetcher.ts

 const URL_FETCH_FALLBACK_TIMEOUT = 20_000 // 20 seconds for fallback
+const MAX_FETCH_RETRIES = 3 // Number of retries for transient errors
+const USER_AGENT =
+	"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"


Is this Chrome version intentional? Chrome 58 is from 2017. Current stable is 120+. Consider updating to a more recent version for better compatibility.

roomote · 2025-08-04T20:02:17Z

src/services/browser/UrlContentFetcher.ts

-		}
-		/*
+		let content = await this.fetchUrlContent(url)
+		const analyzedContent = await analyzeWebsite(content)


The function could throw errors during HTML parsing. Should we wrap this in a try-catch to handle parsing failures gracefully?

roomote · 2025-08-04T20:02:18Z

src/services/browser/analyzeWebsite.ts

+			score,
+			details,
+		}
+	} catch (error: any) {


Using type defeats TypeScript's type safety. Consider:
Failed to process HTML. Reason:

roomote · 2025-08-04T20:02:18Z

src/services/browser/analyzeWebsite.ts

+
+/**
+ * Analyzes a website's HTML to determine if it likely requires JavaScript to render meaningful content.
+ * Uses fetch API instead of axios.


The JSDoc mentions "Uses fetch API instead of axios" but this function doesn't use fetch - it just analyzes HTML. Could we update this comment to be more accurate?

roomote · 2025-08-04T20:02:18Z

src/services/browser/analyzeWebsite.ts

+
+		if ($("noscript").length > 0) {
+			details.hasNoScriptTag = true
+			score -= 50


These magic numbers make the scoring logic hard to understand. Would it help to extract them as named constants? For example:

roomote · 2025-08-04T20:02:32Z

Additional Review Feedback

Critical Issues:

Missing GitHub Issue Link: This PR doesn't link to any GitHub issue. According to the contribution guidelines, every PR must be linked to an approved issue. Please update the "Closes: #" field in the PR description with the appropriate issue number.
Missing Test Coverage:
- The new analyzeWebsite.ts module has no test coverage
- The existing UrlContentFetcher.spec.ts tests haven't been updated to cover:
  - The new fetchUrlContent method
  - The fetch-first approach logic
  - The integration with analyzeWebsite
Please add comprehensive tests for these new features.
Test Failures: The test file expects the old macOS user agent string, but the implementation now uses a Windows user agent. This mismatch will cause test failures. The test on line 138 of UrlContentFetcher.spec.ts needs to be updated to match the new user agent string.

QuinsZouls added 2 commits August 4, 2025 13:50

feat: add website analysis function to determine JavaScript requirements

d8f0a50

feat: implement URL content fetching with retry logic and user agent …

29821b0

…configuration

github-project-automation bot added this to Roo Code Roadmap and Roo Code Roadmap Aug 4, 2025

github-project-automation bot moved this to New in Roo Code Roadmap Aug 4, 2025

github-project-automation bot moved this to Triage in Roo Code Roadmap Aug 4, 2025

roomote bot reviewed Aug 4, 2025

View reviewed changes

hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 4, 2025

daniel-lxs moved this from Triage to PR [Draft / In Progress] in Roo Code Roadmap Aug 6, 2025

hannesrudolph added PR - Draft / In Progress and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Aug 6, 2025

hannesrudolph closed this Sep 23, 2025

github-project-automation bot moved this from PR [Draft / In Progress] to Done in Roo Code Roadmap Sep 23, 2025

github-project-automation bot moved this from New to Done in Roo Code Roadmap Sep 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/add fetch first approach #6683

Feature/add fetch first approach #6683

Uh oh!

QuinsZouls commented Aug 4, 2025 •

edited

Loading

Uh oh!

roomote bot left a comment

Uh oh!

roomote bot Aug 4, 2025

Uh oh!

roomote bot Aug 4, 2025

Uh oh!

roomote bot Aug 4, 2025

Uh oh!

roomote bot Aug 4, 2025

Uh oh!

roomote bot Aug 4, 2025

Uh oh!

roomote bot Aug 4, 2025

Uh oh!

roomote bot commented Aug 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feature/add fetch first approach #6683

Feature/add fetch first approach #6683

Uh oh!

Conversation

QuinsZouls commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related GitHub Issue

Roo Code Task Context (Optional)

Description

Test Procedure

Pre-Submission Checklist

Screenshots / Videos

Documentation Updates

Additional Notes

Get in Touch

Uh oh!

roomote bot left a comment

Choose a reason for hiding this comment

Uh oh!

roomote bot Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot commented Aug 4, 2025

Additional Review Feedback

Critical Issues:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

QuinsZouls commented Aug 4, 2025 •

edited

Loading