You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Hybrid Rollout Manager Implementation - Live at 10% (#186)
* feat: comprehensive rollout monitoring and documentation
Implements enterprise-grade performance monitoring and complete documentation
for the hybrid progressive capture rollout system.
## Performance Monitoring Infrastructure
### GitHub Actions Workflows (bdougie/jobs)
- rollout-health-monitor.yml: 15-minute health checks with auto-rollback
- rollout-metrics-collector.yml: hourly metrics aggregation and analysis
- rollout-emergency-rollback.yml: manual emergency response with validation
- rollout-performance-dashboard.yml: daily dashboard generation
### Monitoring Scripts
- scripts/rollout/health-checker.js: comprehensive health monitoring
- scripts/rollout/metrics-aggregator.js: performance and cost analysis
### Sentry Integration
- src/lib/progressive-capture/sentry-rollout-alerts.ts: specialized alert system
- Multi-level severity routing (critical/warning/info)
- Context enrichment for all rollout operations
## Complete Documentation Suite
### Implementation Phases
- docs/rollout/phase-1-infrastructure.md: database schema, rollout manager
- docs/rollout/phase-2-targeting.md: repository categorization, targeting
- docs/rollout/phase-3-monitoring.md: monitoring workflows, safety mechanisms
### Operational Documentation
- docs/rollout/console-commands.md: short production commands (r.s(), r.h())
- docs/rollout/emergency-procedures.md: incident response and recovery
## Key Features
- Real-time health monitoring with 15-minute checks
- Automated safety mechanisms with 5% error rate threshold
- Circuit breaker patterns with predictive capabilities
- Multi-channel alerting through Sentry
- Short console commands for production operations
- Comprehensive emergency response procedures
## Production Ready
The system provides enterprise-grade observability and safety mechanisms
for confident production deployment with immediate rollback capabilities.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* fix: resolve TypeScript errors in Sentry rollout alerts
Fixes TypeScript compilation errors in sentry-rollout-alerts.ts:
- Add index signature to RolloutAlertContext interface for Sentry compatibility
- Cast rollout context to Record<string, unknown> when passing to Sentry methods
- Ensure proper type handling for partial context updates
All TypeScript errors resolved, build now passes successfully.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* fix: resolve process.env issues for Vite browser compatibility
Fixes "process is not defined" runtime errors in progressive capture modules
by updating environment variable access patterns:
- Replace `process.env.VITE_*` with `import.meta.env?.VITE_* || process.env.VITE_*`
- Replace `process.env.NODE_ENV === 'development'` with `import.meta.env?.DEV`
- Apply fixes across 8 progressive capture modules
This follows the documented pattern in CLAUDE.md for handling environment
variables in Vite applications while maintaining Node.js compatibility.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* Implement Phase 5: Testing & Optimization suite
Add comprehensive testing and optimization tools for hybrid progressive capture:
**Testing Suite:**
- hybrid-system-test.js: Parallel testing on both Inngest and GitHub Actions
- edge-case-tester.js: 30+ edge cases and error scenario testing
- phase5-test-runner.js: Master coordinator for all Phase 5 activities
- data-gap-validator.js: Cross-system data consistency validation
**Optimization Tools:**
- inngest-optimizer.js: Optimize for recent data (concurrency, batch size, timeouts)
- github-actions-optimizer.js: Optimize for historical data (throughput, cost, parallelization)
**Monitoring & Analysis:**
- cost-analyzer.js: Cost tracking, savings validation, projections
**Key Features:**
- Validates 60-85% cost reduction target
- Ensures data consistency between systems
- Tests resilience to network failures, rate limits, timeouts
- Optimizes performance for real-time vs bulk processing
- Comprehensive reporting and recommendations
Phase 5 ensures production readiness with full test coverage and optimization.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* docs: update remaining features tracking
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* security: implement secure environment variable system
CRITICAL SECURITY FIXES:
- Remove VITE_INNGEST_EVENT_KEY exposure to browser
- Add runtime guards preventing server key access in browser
- Fix CommonJS import.meta.env issues in Netlify functions
- Separate client/server environment variable access
NEW SECURE SYSTEM:
- src/lib/env.ts: Context-aware environment access with guards
- Client keys: VITE_* prefix, browser-safe
- Server keys: No VITE_* prefix, server-only with protection
- Runtime validation and security warnings
MIGRATED FILES:
- src/lib/inngest/client.ts: Safe server key access
- src/lib/supabase.ts: Use clientEnv for public keys
- src/lib/github.ts: Secure token access patterns
- src/lib/inngest/*: Server context token handling
DOCUMENTATION:
- docs/security/environment-variables.md: Complete security guide
- .env.example: Clear client vs server variable separation
This prevents server secrets from being exposed in browser bundles
and resolves CommonJS compatibility issues in Netlify functions.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* fix: universal environment system for client/server contexts
RESOLVED ISSUES:
- ✅ Netlify Functions can access VITE_SUPABASE_URL and other env vars
- ✅ Eliminated CommonJS import.meta.env warnings in server functions
- ✅ Browser still has secure access to VITE_* prefixed variables
- ✅ Server functions use process.env exclusively (no import.meta)
TECHNICAL CHANGES:
- Universal getEnvVar() function with context detection
- Try-catch protection for import.meta.env access
- Single env export that works in both browser and server
- Fallback patterns: browser uses import.meta.env, server uses process.env
SECURITY MAINTAINED:
- Browser: Only VITE_* variables accessible via import.meta.env
- Server: Both VITE_* and server-only variables via process.env
- Runtime guards still prevent server secret access in browser
This fixes the "Missing environment variable: VITE_SUPABASE_URL" error
in Netlify Functions while maintaining security boundaries.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* fix: browser compatibility for hybrid queue manager components
RESOLVED BROWSER LOADING ISSUES:
- ✅ Fixed 503 errors when loading hybrid-queue-manager in browser
- ✅ Eliminated process.env access in browser-loaded modules
- ✅ Added hybrid rollout environment variables to universal env system
- ✅ All progressive capture modules now browser-safe
SPECIFIC FIXES:
- github-actions-queue-manager.ts: Use env.GITHUB_TOKEN instead of process.env
- rollout-console.ts: Use env.HYBRID_* instead of process.env fallbacks
- rollout-manager.ts: Use env.HYBRID_EMERGENCY_STOP for browser compatibility
- env.ts: Added HYBRID_ROLLOUT_* environment variables
TECHNICAL IMPACT:
- React components can now safely import HybridQueueManager
- Chart components will load without 503 Service Unavailable errors
- All environment access uses universal system (browser + server safe)
- No more Node.js-specific dependencies in browser bundles
This fixes the chart loading failures and makes all progressive capture
components fully browser-compatible while maintaining server functionality.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* feat: implement Phase 6 gradual rollout to 10%
- Add phase6-implementation.js for complete rollout setup
- Add monitor-phase6.js for continuous health monitoring
- Add rollout-dashboard.js for interactive management
- Add comprehensive README with operational procedures
- Enable auto-rollback at 5% error threshold
- Support repository categorization and smart targeting
- Provide emergency stop and recovery procedures
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* WIP - broken
* fix: correct database schema field mapping for PR branches
- Fix null constraint violation in pull_requests table
- Change head_ref/base_ref to head_branch/base_branch to match schema
- Simplify rollout metrics queries to avoid 406 PostgREST errors
- Remove unsupported GraphQL filterBy argument for GitHub API compatibility
- Add client-side date filtering for recent PRs query
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* feat: implement Phase 4 rollout + complete Phase 3 monitoring
Phase 4: 10% Test Repository Deployment ✅ ACTIVE
- Add monitorPhase4() function to rollout console with comprehensive monitoring
- Configure 3 test repositories (BlueMatthew/WechatExporter, Robdel12/DraftPatch, analogjs/analog)
- Set rollout percentage to 10% with repository_size strategy
- Enable auto rollback at 5% error threshold with 24-hour monitoring window
Phase 3: Monitoring and Safety ✅ FUNCTIONALLY COMPLETED
- Real-time performance monitoring dashboard implemented
- Automatic rollback triggers with error rate thresholds
- Circuit breaker implementation via emergency stop
- Alert system with console monitoring and auto rollback
Status: Phase 4 active and ready for 24-48 hour monitoring period
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* fix: resolve PR state constraint violation errors
- Update database constraint to allow 'merged' state in addition to 'open' and 'closed'
- Fix state mapping in all Inngest capture functions:
- GraphQL repository sync: Map GitHub states (OPEN/CLOSED/MERGED) to database format
- GraphQL PR details: Handle merged vs closed state distinction
- REST API sync: Properly detect merged PRs using merged field
- Use ternary logic: open -> 'open', merged -> 'merged', everything else -> 'closed'
This resolves the constraint violation error that was blocking Phase 4 rollout jobs.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* fix: resolve nested step tooling warnings in Inngest functions
- Separate job preparation from event sending to avoid nested step.* calls
- Split queue-detailed-capture into two steps:
1. prepare-job-queue: Prepare job data without sending events
2. send-queued-events: Send events using prepared data
- Fix both REST and GraphQL repository sync functions
- Update step numbering after adding new preparation step
This resolves the Inngest warning: "We detected that you have nested step.* tooling"
that was occurring when step.sendEvent was called inside step.run blocks.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* fix: update metrics aggregator environment variable handling
- Add fallback to VITE_SUPABASE_URL and SUPABASE_SERVICE_ROLE_KEY
- Fix success rate calculation to properly handle processing jobs
- Improve error handling for missing environment variables
📊 Rollout Metrics (24h):
- Total Jobs: 57 (56 Inngest, 1 GitHub Actions)
- Status: 54 processing, 3 failed
- Repository Participation: 25% (1 of 4 repositories)
- Error Analysis: GitHub Actions auth issue, Inngest network errors
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* fix: remove nested step tooling in Inngest functions
- Move step.sendEvent calls outside of step.run blocks in both REST and GraphQL sync functions
- This fixes the NESTING_STEPS warning from Inngest
- Each sendEvent now has a unique event name with an index to avoid conflicts
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* fix: update GraphQL query to use correct field names
- Replace 'reviewComments' field with nested 'reviews.comments' structure
- Update data processing to extract review comments from within each review
- Fix column names: use 'commenter_id' instead of 'author_id'
- Fix column names: use 'comment_type' instead of 'type'
- Update field references: 'replyTo' instead of 'inReplyTo', 'outdated' field added
This fixes the GraphQL errors when fetching PR details from GitHub API.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* fix: ensure contributors exist before storing PR data
- Add ensureContributorExists helper function to create/update contributors
- Ensure author, merged_by, and comment authors exist before storing PRs/reviews/comments
- Remove non-existent columns (author_login, author_avatar_url) from PR upsert
- Fix state field to only use 'open' or 'closed' (not 'merged')
- Change onConflict from 'repository_id,number' to 'github_id' for consistency
This fixes the schema mismatch errors when storing PR data from GraphQL queries.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* feat: add rollout monitoring scripts for GitHub Actions
- Add performance-analyzer.js to analyze metrics and identify bottlenecks
- Add alert-manager.js to send alerts to Sentry based on thresholds
- Add package.json for script dependencies
- These scripts are used by the GitHub Actions workflow in the jobs repo
Scripts provide:
- Performance trend analysis
- Bottleneck identification
- Alert management with critical/warning thresholds
- Health status monitoring
- Cost efficiency tracking
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* chore: add rollout metrics files to gitignore
- Exclude rollout-metrics-*.json files (temporary data files)
- Exclude performance-analysis-*.json files (generated reports)
- Exclude trend-analysis-*.json files (analysis outputs)
These files are generated by the monitoring scripts and should not be committed.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* clean up report
* fix: correct database schema and data type issues in Inngest functions
- Fix boolean field error by handling 'UNKNOWN' mergeable state properly
- Update contributor schema to use display_name instead of name
- Use correct timestamp field names (github_created_at, first_seen_at, last_updated_at)
- Store github_id as bigint numbers instead of strings
- Add required profile_url, is_bot, and is_active fields
- Improve error handling and logging in ensureContributorExists
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* fix: properly convert mergeable enum to boolean
GitHub GraphQL API returns mergeable as string enum:
- "MERGEABLE" → true
- "CONFLICTING" → false
- "UNKNOWN" → null
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* fix: add comprehensive Supabase mocking to prevent test environment errors
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
---------
Co-authored-by: Claude <[email protected]>
0 commit comments