Skip to content

feat: add production resilience features#9

Merged
mlevkov merged 2 commits intomainfrom
improvements_1
Dec 15, 2025
Merged

feat: add production resilience features#9
mlevkov merged 2 commits intomainfrom
improvements_1

Conversation

@mlevkov
Copy link
Owner

@mlevkov mlevkov commented Dec 14, 2025

Summary

Adds production-level resilience features including circuit breaker pattern, Prometheus metrics export, request timeout propagation, and fixes for edge cases in IP extraction and validation.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • CI/CD changes

Changes Made

  • Circuit Breaker Pattern (src/iggy_client/circuit_breaker.rs)

    • Three-state circuit breaker: Closed → Open → HalfOpen
    • Configurable failure/success thresholds and open duration
    • Integrated with IggyClientWrapper.with_reconnect() for fail-fast during outages
    • New AppError::CircuitOpen error type (returns 503)
  • Prometheus Metrics (src/metrics.rs)

    • Counters: iggy_messages_sent_total, iggy_messages_polled_total, iggy_connection_reconnects_total, iggy_circuit_breaker_opens_total
    • Histograms: iggy_request_duration_seconds, iggy_send_duration_seconds, iggy_poll_duration_seconds
    • Gauges: iggy_connection_status, iggy_circuit_breaker_state
    • Configurable via METRICS_PORT (default: 9090, 0 = disabled)
  • Request Timeout Propagation (src/middleware/timeout.rs)

    • Clients can specify X-Request-Timeout header (milliseconds)
    • Bounded: 100ms minimum, 5 minutes maximum
    • RequestTimeoutExt trait for easy extraction in handlers
  • Bug Fixes

    • Empty/whitespace-only X-Forwarded-For headers now return "unknown" instead of empty string
    • Prevents creating separate rate-limit buckets for malformed headers
  • Validation Improvements

    • Added MAX_CONSUMER_ID (1 billion) upper bound validation
    • Catches likely misconfigurations with garbage data

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed

Test commands run:

cargo fmt --check
cargo clippy -- -D warnings
cargo test --lib  # 130 tests passing

Checklist

Code Quality

  • Code follows project style guidelines (cargo fmt)
  • No new Clippy warnings (cargo clippy -- -D warnings)
  • Public APIs have documentation comments
  • Error handling is appropriate (no unwrap in production code)

Testing

  • Tests cover the happy path
  • Tests cover error cases
  • All existing tests pass

Documentation

  • CLAUDE.md updated (if architectural changes)
  • README updated (if user-facing changes)
  • Code comments explain "why" not "what"

Security

  • No secrets or credentials committed
  • Input validation added where needed
  • No new security vulnerabilities introduced

New Configuration Options

Variable Default Description
CIRCUIT_BREAKER_FAILURE_THRESHOLD 5 Failures before opening circuit
CIRCUIT_BREAKER_SUCCESS_THRESHOLD 2 Successes in half-open to close
CIRCUIT_BREAKER_OPEN_DURATION_SECS 30 How long circuit stays open
METRICS_PORT 9090 Prometheus metrics port (0 = disabled)

Additional Notes

  • Circuit breaker only counts connection-related errors (not validation errors or timeouts with a healthy connection)
  • Metrics are recorded even without explicit init_metrics() call (noop if not initialized)
  • Request timeout middleware is applied before the request ID layer in the middleware stack

- Add circuit breaker pattern for fail-fast during outages
  - Configurable failure/success thresholds and open duration
  - Three states: Closed, Open, HalfOpen
  - Integrated with IggyClientWrapper reconnection logic

- Add Prometheus metrics export (metrics-rs)
  - Counters: messages sent/polled, reconnects, circuit breaker events
  - Histograms: request/send/poll duration
  - Gauges: connection status, circuit breaker state
  - Configurable via METRICS_PORT (default: 9090, 0 = disabled)

- Add request timeout propagation middleware
  - Clients specify timeout via X-Request-Timeout header (milliseconds)
  - Bounded: 100ms min, 5min max
  - RequestTimeoutExt trait for handler extraction

- Fix empty X-Forwarded-For header handling
  - Empty/whitespace-only headers now return "unknown"
  - Prevents separate rate-limit buckets for empty headers

- Add consumer ID upper bound validation
  - MAX_CONSUMER_ID = 1 billion
  - Catches likely misconfigurations

🤖 Generated with [Claude Code](https://claude.com/claude-code)
@mlevkov mlevkov self-assigned this Dec 14, 2025
@mlevkov mlevkov added documentation Improvements or additions to documentation enhancement New feature or request maintenance Updating dependencies labels Dec 14, 2025
@mlevkov mlevkov merged commit dbdf949 into main Dec 15, 2025
22 checks passed
@mlevkov mlevkov deleted the improvements_1 branch December 15, 2025 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request maintenance Updating dependencies

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant