Skip to content

Conversation

@QuinsZouls
Copy link
Contributor

@QuinsZouls QuinsZouls commented Aug 4, 2025

Related GitHub Issue

Closes: #

Roo Code Task Context (Optional)

Description

This PR introduces several improvements to the URL content fetching mechanism, making it more robust and intelligent.
Key Changes:

  • Smarter Content Fetching:

    • A new analyzeWebsite.ts module has been added. This module analyzes the initial HTML of a URL to determine if it's a static site or a single-page
      application (SPA) that requires JavaScript to render its content.
    • Based on this analysis, the UrlContentFetcher.ts now decides whether to use a simple fetch or a full browser rendering with Puppeteer. This avoids
      unnecessary browser launching for static sites, improving performance and resource usage.
  • Improved Fetching Robustness:

    • The UrlContentFetcher.ts now includes a retry mechanism (MAX_FETCH_RETRIES) for fetching URL content, making it more resilient to transient network errors.
    • A standardized USER_AGENT is now used for all fetch requests.
    • Fallback logic for Puppeteer has been refined. If loading a page with networkidle2 fails, it will retry with domcontentloaded for better success rates on
      slow-loading pages.

Impact:

These changes lead to a more efficient, reliable, and intelligent URL content fetching service. By avoiding Puppeteer for static sites, the system will be faster and consume fewer resources.

Test Procedure

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Testing: New and/or updated tests have been added to cover my changes (if applicable).
  • Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

Documentation Updates

Additional Notes

Get in Touch

Copy link
Contributor

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! I've reviewed the changes and found several issues that need attention before this can be merged.

}
}
}
return ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return statement here is unreachable since the loop either returns a value or throws an error. Consider removing this line:

const URL_FETCH_FALLBACK_TIMEOUT = 20_000 // 20 seconds for fallback
const MAX_FETCH_RETRIES = 3 // Number of retries for transient errors
const USER_AGENT =
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this Chrome version intentional? Chrome 58 is from 2017. Current stable is 120+. Consider updating to a more recent version for better compatibility.

}
/*
let content = await this.fetchUrlContent(url)
const analyzedContent = await analyzeWebsite(content)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function could throw errors during HTML parsing. Should we wrap this in a try-catch to handle parsing failures gracefully?

score,
details,
}
} catch (error: any) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using type defeats TypeScript's type safety. Consider:
Failed to process HTML. Reason:


/**
* Analyzes a website's HTML to determine if it likely requires JavaScript to render meaningful content.
* Uses fetch API instead of axios.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSDoc mentions "Uses fetch API instead of axios" but this function doesn't use fetch - it just analyzes HTML. Could we update this comment to be more accurate?


if ($("noscript").length > 0) {
details.hasNoScriptTag = true
score -= 50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These magic numbers make the scoring logic hard to understand. Would it help to extract them as named constants? For example:

@roomote
Copy link
Contributor

roomote bot commented Aug 4, 2025

Additional Review Feedback

Critical Issues:

  1. Missing GitHub Issue Link: This PR doesn't link to any GitHub issue. According to the contribution guidelines, every PR must be linked to an approved issue. Please update the "Closes: #" field in the PR description with the appropriate issue number.

  2. Missing Test Coverage:

    • The new analyzeWebsite.ts module has no test coverage
    • The existing UrlContentFetcher.spec.ts tests haven't been updated to cover:
      • The new fetchUrlContent method
      • The fetch-first approach logic
      • The integration with analyzeWebsite

    Please add comprehensive tests for these new features.

  3. Test Failures: The test file expects the old macOS user agent string, but the implementation now uses a Windows user agent. This mismatch will cause test failures. The test on line 138 of UrlContentFetcher.spec.ts needs to be updated to match the new user agent string.

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 4, 2025
@daniel-lxs daniel-lxs moved this from Triage to PR [Draft / In Progress] in Roo Code Roadmap Aug 6, 2025
@hannesrudolph hannesrudolph added PR - Draft / In Progress and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Aug 6, 2025
@github-project-automation github-project-automation bot moved this from PR [Draft / In Progress] to Done in Roo Code Roadmap Sep 23, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants