Skip to content

Conversation

@Robdel12
Copy link
Contributor

Summary

Improves the static-site SDK with better UX, smarter defaults, and debugging tools:

  • Exit early with helpful message when no TDD server or API token is found (instead of continuing with 456 "client not initialized" errors)
  • Smart concurrency defaults based on CPU cores (min 2, max 8, using half of available cores)
  • Interactive progress display with ETA in TTY mode, traditional logging in CI
  • Fix sitemap index handling - now follows child sitemaps correctly instead of treating them as pages
  • Add --dry-run option to preview discovered pages without capturing screenshots

Changes

Better Error Handling

  • Exits early with clear instructions when no TDD server or API token is available
  • Shows two options: start TDD server or set VIZZLY_TOKEN

Smart Concurrency

  • Auto-detects optimal concurrency from CPU cores
  • Formula: Math.max(2, Math.min(8, Math.floor(cores / 2)))
  • Can still be overridden via config or CLI

Progress Display

  • In TTY: Single updating progress line with percentage and ETA
  • In CI: Traditional per-task logging
  • Shows completion time at end

Sitemap Index Support

  • Properly handles sitemap index files (sitemap of sitemaps)
  • Follows child sitemap references to extract actual page URLs
  • Fixes issue where sitemap-0.xml was being captured as a page

Dry Run Option

vizzly static-site ./dist --dry-run

Output shows:

  • Pages grouped by source (sitemap vs HTML scan)
  • Task count (pages × viewports)
  • Viewport details
  • Concurrency setting

Test plan

  • Run tests: npm test (154 tests passing)
  • Test dry-run with real static site build
  • Verify interactive progress in terminal
  • Verify CI-style output with CI=true
  • Test sitemap index handling

- Exit early with helpful message when no TDD server or API token found
- Add smart default concurrency based on CPU cores (half cores, min 2, max 8)
- Add progress with ETA during screenshot capture
- Show total completion time at end of run

The --concurrency flag already existed; now the default is smarter.
Sitemap:
- Fix sitemap index files returning child sitemap URLs as pages
- Now follows sitemap index to parse actual child sitemaps
- Extracts real page URLs instead of sitemap-0.xml etc.

Progress display:
- Interactive terminals: single updating progress line with ETA
- Non-interactive (CI): traditional per-task logging
- Shows completion summary with total time and error count
- Add --dry-run flag to print discovered pages without capturing screenshots
- Shows pages grouped by source (sitemap vs HTML scan)
- Displays task count, viewports, and concurrency settings
- Useful for debugging page discovery and include/exclude patterns

Also includes tests for:
- Sitemap XML parsing (standard and index formats)
- Config schema validation and smart concurrency defaults
- Dry-run mode behavior with various options
@blacksmith-sh

This comment has been minimized.

@claude
Copy link

claude bot commented Dec 22, 2025

Pull Request Review

This PR significantly improves the static-site SDK's user experience with smart defaults, better error handling, and helpful debugging tools. Overall, the implementation is solid with excellent test coverage. Here's my detailed feedback:


Strengths

Code Quality

  • Clean, focused implementation - Each change addresses a specific UX pain point
  • Excellent test coverage - 784 additions include 546 lines of new tests (70% test code!)
  • Good separation of concerns - Dry-run logic exits early, progress display is modular
  • Strong error messages - Clear, actionable guidance when TDD server/token missing

UX Improvements

  • Smart concurrency defaults (clients/static-site/src/config-schema.js:14-17) - CPU-aware algorithm with sensible bounds (min 2, max 8) is well-designed
  • Interactive progress display (clients/static-site/src/tasks.js:203-212) - TTY detection with fallback to traditional logging shows good attention to different environments
  • Dry-run feature (clients/static-site/src/index.js:97-138) - Comprehensive preview helps users debug page discovery issues
  • Sitemap index support (clients/static-site/src/utils/sitemap.js:24-43) - Properly follows child sitemaps instead of treating them as pages

🔍 Issues & Concerns

1. Potential Race Condition in Progress Display (Medium Priority)

Location: clients/static-site/src/tasks.js:222-229

Issue: When running with high concurrency, multiple tasks may fail simultaneously. The writeProgress('') + process.stdout.write sequence could interleave with concurrent success messages, leading to garbled output.

Recommendation: Wrap the entire error logging block in a mutex/lock, or accumulate errors and print them after clearing the progress line once.

2. ETA Calculation May Be Inaccurate Initially (Low Priority)

Location: clients/static-site/src/tasks.js:195-200

Issue: Early in execution, avgTime can be highly variable. The first task might take 5s (cold browser), while subsequent ones take 500ms, leading to wildly inaccurate ETAs initially.

Recommendation: Consider using an exponential moving average or wait until N tasks complete before showing ETA.

3. Sitemap Child Resolution Has Security Implications (Medium Priority)

Location: clients/static-site/src/utils/sitemap.js:28-31

Issue: A malicious sitemap could contain URLs like https://evil.com/../../../etc/passwd. While join() normalizes paths, the .split('/').pop() approach is fragile.

Recommendation:

  • Validate that filename doesn't contain path traversal sequences
  • Check that the resolved childPath is actually within baseDir

4. Redundant Dynamic Imports (Low Priority)

Location: clients/static-site/src/utils/sitemap.js:17-18

Issue: Dynamic imports are already available at the top of the file. These redundant dynamic imports add overhead.

Recommendation: Import dirname and existsSync at the top of the file instead.

5. Silent Failures in Sitemap Parsing (Low Priority)

Location: clients/static-site/src/utils/sitemap.js:34-40

Issue: Silent failures make debugging difficult. Users won't know if a child sitemap failed to parse due to permissions, corrupted XML, or other issues.

Recommendation: Add optional debug logging when VIZZLY_LOG_LEVEL is set to debug.

6. getDefaultConcurrency() Performance (Low Priority)

Location: clients/static-site/src/config-schema.js:94,109

Issue: cpus().length is called every time getDefaultConcurrency() executes. While cheap, this could be cached.

Recommendation: Cache the result in a module-level constant.


💡 Suggestions

1. Consider Exponential Backoff for Failed Child Sitemaps

If a child sitemap is temporarily unavailable (network file system, etc.), might be worth retrying with backoff instead of silently skipping.

2. Add --verbose Option for Dry Run

The dry-run output is great, but for large sites with 1000+ pages, it could be overwhelming. Consider adding a verbose flag to show all pages vs just a summary.

3. Progress Bar Library for Better UX

Consider using a library like cli-progress or ora for more robust progress display handling (handles terminal resizing, concurrent writes, etc.).


📋 Test Coverage Analysis

Excellent work on testing! New test files provide comprehensive coverage:

  • config-schema.test.js - Tests CPU-based concurrency defaults thoroughly
  • dry-run.test.js - Covers all dry-run scenarios including edge cases
  • sitemap.test.js - Tests both standard and index sitemap formats

Suggestion: Add edge case tests for:

  • Sitemap index with missing child files
  • Sitemap with malformed/malicious URLs
  • Concurrent task failures in progress display

🔒 Security Assessment


🎯 Summary

Recommendation: Approve with minor changes

This is a high-quality PR that delivers real value. The UX improvements are well-thought-out and the test coverage is exemplary. Address the path traversal concern (Issue #3) before merging, and consider the race condition fix (Issue #1) for a follow-up PR.

Priority Fixes:

  1. 🔴 Add path validation to sitemap child resolution (security)
  2. 🟡 Fix potential race condition in error logging (UX)

Nice-to-haves:

  • Optimize getDefaultConcurrency() to cache CPU count
  • Add debug logging for skipped child sitemaps
  • Improve ETA accuracy with exponential moving average

Great work! The static-site SDK is significantly better with these changes.

- Fix race condition in progress display by queuing errors
- Improve ETA accuracy: wait for 5+ samples, use recent 20 for average
- Add path traversal protection for sitemap child resolution
- Move dynamic imports to top level in sitemap.js
- Cache CPU count for concurrency default
- Add security tests for sitemap path traversal
@Robdel12 Robdel12 merged commit 4ad26cb into main Dec 22, 2025
18 checks passed
@Robdel12 Robdel12 deleted the rd/static-site branch December 22, 2025 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants