Skip to content

Add Observability Stack and Comprehensive Documentation#6

Merged
mlevkov merged 9 commits intomainfrom
grafana_stack
Dec 8, 2025
Merged

Add Observability Stack and Comprehensive Documentation#6
mlevkov merged 9 commits intomainfrom
grafana_stack

Conversation

@mlevkov
Copy link
Owner

@mlevkov mlevkov commented Dec 8, 2025

Summary

Adds a complete Grafana-based observability stack for monitoring Iggy and comprehensive documentation guides for event-driven architecture, partitioning, and durable storage.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • CI/CD changes

Changes Made

Observability Stack

  • Prometheus (port 9090): Metrics collection with 15-day retention, scraping Iggy at /metrics
  • Grafana (port 3001): Pre-configured dashboards with Prometheus datasource (admin/admin)
  • Iggy Web UI (port 3050): Dashboard for managing streams, topics, messages, and users (iggy/iggy)
  • Pre-built Iggy Overview dashboard with server status, request rates, message throughput, and latency percentiles
  • Updated docker-compose.yaml with full stack configuration

Documentation

  • docs/guide.md: Comprehensive event-driven architecture guide covering streams/topics/partitions, consumer groups, error handling patterns (idempotency, DLQ), and production patterns (outbox, saga)
  • docs/partitioning-guide.md: Deep dive into partition keys, ordering guarantees, and partition selection strategies
  • docs/durable-storage-guide.md: Storage architecture, fsync configuration, backup/archiving to S3, recovery procedures, and production recommendations
  • docs/README.md: Documentation index with quick navigation by topic
  • Updated README.md and CLAUDE.md to reference the docs directory
  • Updated CHANGELOG.md

Housekeeping

  • Added .claude/settings.local.json to .gitignore
  • Dependency updates in Cargo.lock

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed

Test commands run:

# Verified observability stack starts correctly
docker-compose up -d

# Verified all services accessible:
# - Iggy HTTP API: http://localhost:3000
# - Iggy Web UI: http://localhost:3050
# - Prometheus: http://localhost:9090
# - Grafana: http://localhost:3001

Checklist

Code Quality

  • Code follows project style guidelines (cargo fmt)
  • No new Clippy warnings (cargo clippy -- -D warnings)
  • Public APIs have documentation comments
  • Error handling is appropriate (no unwrap in production code)

Testing

  • Tests cover the happy path
  • Tests cover error cases
  • All existing tests pass

Documentation

  • CLAUDE.md updated (if architectural changes)
  • README updated (if user-facing changes)
  • Code comments explain "why" not "what"

Security

  • No secrets or credentials committed
  • Input validation added where needed
  • No new security vulnerabilities introduced

Related Issues

None

Screenshots (if applicable)

N/A - Infrastructure and documentation changes only

Additional Notes

Quick Start

# Start the full observability stack
docker-compose up -d

# Access points:
# - Iggy HTTP API: http://localhost:3000
# - Iggy Web UI: http://localhost:3050 (iggy/iggy)
# - Prometheus: http://localhost:9090
# - Grafana: http://localhost:3001 (admin/admin)

Files Changed

  • 13 files changed, +3,965 lines, -86 lines

- Add Prometheus service with Iggy metrics scraping (15s interval)
- Add Grafana service with auto-provisioned datasource and dashboard
- Configure Iggy server to expose /metrics endpoint
- Create pre-built Iggy Overview dashboard with:
  - Server status indicator
  - HTTP request rate graph
  - Message throughput visualization
  - Request latency percentiles (p50, p95)
  - Active streams counter
- Add shared Docker network for service discovery
- Update CLAUDE.md with observability documentation

Access points:
- Iggy HTTP: localhost:3000
- Prometheus: localhost:9090
- Grafana: localhost:3001 (admin/admin)
- Add apache/iggy-web-ui service on port 3050
- Configure Web UI to connect to Iggy HTTP API
- Update documentation with Web UI access and features

Web UI provides:
- Streams and topics management
- Message browsing and inspection
- User management
- Server health monitoring

Access: http://localhost:3050 (iggy/iggy)
- Add comprehensive guide covering:
  - Streams, topics, and partitions explained
  - Partition keys and ordering guarantees
  - Consumer groups and scaling patterns
  - Message retention and expiry configuration
  - Error handling strategies (at-least-once, DLQ, idempotency)
  - Production patterns (outbox, saga, versioning)
  - Complete code examples for producers and consumers
  - CLI and HTTP API quick reference

- Update README with:
  - Observability stack documentation (Grafana, Prometheus, Web UI)
  - Service access URLs and credentials
  - Architecture diagram with observability layer
  - Prometheus metrics and OpenTelemetry configuration
  - Link to new guide in documentation section
- Add comprehensive partitioning-guide.md covering:
  - What partitions are and why they exist
  - The ordering problem and partition key solutions
  - Domain-specific partition key examples
  - Common partitioning mistakes and how to avoid them
  - Partition count guidelines and capacity planning
  - Consumer group rebalancing mechanics
  - Real-world scenario walkthroughs
  - Troubleshooting guide

- Enhance docs/guide.md with educational details:
  - Add detailed Database struct explanation with SQL examples
  - Add database schema for idempotency pattern
  - Expand Outbox pattern with problem statement and solution
  - Add outbox table schema and publisher query examples
  - Add Further Reading references to external resources
Add comprehensive Iggy-specific content to the partitioning guide:

- Iggy Partitioning Strategies section:
  - Balanced (round-robin) strategy
  - Partition ID (direct assignment) strategy
  - Messages Key (hash-based) strategy with MurmurHash3 explanation
  - Comparison table and use case guidance

- Iggy Partition Storage Architecture section:
  - Physical storage layout (streams/topics/partitions/segments)
  - Segment structure and benefits
  - Offset and time indexes explained
  - Index caching modes

- Iggy Server Configuration section:
  - Partition settings (fsync, checksum, buffer thresholds)
  - Segment settings (size, expiry, caching, archival)
  - Topic settings (max_size, auto-delete)
  - Message deduplication options
  - Configuration trade-offs table

- Custom Partitioners section:
  - Partitioner trait explanation
  - Example: Weighted partitioner for heterogeneous hardware
  - Example: Time-based partitioner for analytics
  - Example: Content-based priority partitioner
  - Integration with IggyClient

- Enhanced Further Reading section:
  - Apache Iggy official resources
  - Distributed systems fundamentals
  - Related technologies (Kafka, MurmurHash, Cassandra)
  - Benchmarking resources
- Add comprehensive durable storage guide covering:
  - Storage architecture (append-only log, segments)
  - Durability configuration (fsync settings, trade-offs)
  - Data retention policies (time-based, size-based)
  - Backup and archiving (disk, S3-compatible storage)
  - Recovery procedures and data integrity
  - Performance vs durability trade-offs
  - Production recommendations with config templates

- Add docs/README.md as documentation index with:
  - Quick navigation by topic
  - Links to all available guides
  - External resources and references

- Update README.md and CLAUDE.md to reference docs/ directory
@mlevkov mlevkov self-assigned this Dec 8, 2025
@mlevkov mlevkov added documentation Improvements or additions to documentation enhancement New feature or request labels Dec 8, 2025
@mlevkov mlevkov merged commit 7da158d into main Dec 8, 2025
29 checks passed
@mlevkov mlevkov deleted the grafana_stack branch December 8, 2025 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant