A comprehensive, production-ready webhook service system built in Go that can handle 1 billion events per day (~10k QPS). The system provides reliable webhook delivery with retry mechanisms, observability, horizontal scaling capabilities, and Schema Registry integration for type-safe message serialization.
- High Performance: Designed to handle 10,000+ requests per second
- Reliable Delivery: Exponential backoff retry with dead letter queues
- Security: JWT authentication, HMAC payload signing, rate limiting
- Observability: Comprehensive metrics with Prometheus, logs with Loki, dashboards with Grafana
- Scalability: Microservices architecture with Kafka for event streaming
- Fault Tolerance: Circuit breakers, health checks, graceful shutdowns
- Complete Delivery Logging: All webhook delivery attempts are logged and queryable
- Schema Registry Integration: Avro schema validation and evolution for type-safe messaging
- Schema Evolution: Backward/forward compatibility with automatic schema validation
The system consists of 9 microservices:
- API Gateway (Port 8080) - REST API with JWT authentication and webhook management
- Webhook Registry (Port 8086) - gRPC service for webhook CRUD operations
- Event Ingestion (Port 8082) - REST API for receiving events, publishes to Kafka with Avro serialization
- Webhook Dispatcher - Kafka consumer that deserializes Avro messages and sends HTTP requests to webhooks
- Retry Manager (Port 8084) - Redis-based retry scheduling with exponential backoff
- Observability (Port 8085) - Metrics and logging aggregation service
- Dead Letter Queue (DLQ) (Port 8087) - Manages permanently failed webhook deliveries
-
PostgreSQL (Port 5432): Webhook configurations and delivery logs
-
Redis (Port 6379): Retry state management and caching
-
Kafka (Port 9092): Event streaming and message queuing
-
Prometheus (Port 9090): Metrics collection and alerting
-
Grafana (Port 3000): Metrics visualization and dashboards
-
Loki (Port 3100): Log aggregation and querying
-
Kafka UI (Port 8888): Kafka management interface
- Avro Serialization: Type-safe message serialization with schema validation
- Schema Evolution: Backward and forward compatibility support
- Automatic Validation: Runtime schema validation for all messages
- Schema Versioning: Automatic version management and compatibility checking
- Schema Registry API: REST API for schema management and validation
- Go 1.23.0 or later
- Docker and Docker Compose
- Protocol Buffers compiler (protoc)
- Make
- jq (for schema initialization script)
git clone <repository-url>
cd webhook
docker-compose up -d
Wait for services to be healthy:
docker-compose ps
# Install migrate CLI if not already installed
go install -tags 'postgres' github.com/golang-migrate/migrate/v4/cmd/migrate@latest
# Start PostgreSQL container
docker-compose up -d postgres
# Create database
docker-compose exec postgres psql -U postgres -c "CREATE DATABASE webhook_db;"
# Run migrations
migrate -path migrations -database "postgres://postgres:postgres123@localhost:5432/webhook_db?sslmode=disable" up
-
API Gateway: http://localhost:8080
-
Grafana Dashboard: http://localhost:3000 (admin/admin123)
-
Prometheus: http://localhost:9090
-
Kafka UI: http://localhost:8888
-
Redis Insight: http://localhost:8001
First, get a JWT token:
curl -X POST http://localhost:8080/auth/login \
-H "Content-Type: application/json" \
-d '{"username": "test", "password": "test"}'
curl -X POST http://localhost:8080/api/v1/webhooks \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"url": "https://httpbin.org/post",
"event_types": ["user.created", "payment.completed"],
"secret": "your-secret-at-least-16-characters-long",
"headers": {
"X-Custom-Header": "value"
}
}'
curl -X POST http://localhost:8082/events \
-H "Content-Type: application/json" \
-d '{
"id": "event-123",
"type": "user.created",
"source": "user-service",
"data": {
"user_id": "12345",
"email": "[email protected]"
}
}'
Note: Events are automatically serialized using Avro schemas loaded from the schemas/
folder.
curl -X GET http://localhost:8080/api/v1/webhooks \
-H "Authorization: Bearer YOUR_JWT_TOKEN"
curl -X GET "http://localhost:8080/api/v1/webhooks/WEBHOOK_ID/logs" \
-H "Authorization: Bearer YOUR_JWT_TOKEN"
curl -X GET "http://localhost:8080/api/v1/webhooks/WEBHOOK_ID/stats" \
-H "Authorization: Bearer YOUR_JWT_TOKEN"
Configuration is managed through config.yaml
and environment variables. Key settings:
# Database
database:
host: postgres
port: 5432
database: webhook_db
username: postgres
password: postgres123
# Security
security:
jwt_secret: "your-secret-key-here"
hmac_secret: "your-hmac-secret-here"
rate_limit_enabled: true
rate_limit_rps: 100
# Performance
dispatcher:
worker_count: 10
batch_size: 100
http_timeout: 30s
# Retry Policy
retry_manager:
max_retries: 5
initial_delay: 1s
max_delay: 300s
backoff_multiplier: 2.0
The system exposes comprehensive Prometheus metrics:
webhook_delivery_attempts_total
- Total delivery attemptswebhook_delivery_success_total
- Successful deliverieswebhook_delivery_failures_total
- Failed deliverieswebhook_delivery_duration_seconds
- Delivery latencywebhook_http_requests_total
- HTTP request metricswebhook_kafka_messages_produced_total
- Kafka metrics
Pre-configured dashboards show:
- Request rates and latency
- Webhook delivery success rates
- System resource usage
- Error rates and patterns
Structured JSON logs are collected by Loki and can be queried in Grafana.
All webhook delivery attempts are stored in the database and can be queried via:
- API Gateway endpoints (
/api/v1/webhooks/{id}/logs
) - Direct database queries
- Observability service gRPC APIs
- Authentication: JWT tokens with configurable expiration
- Authorization: Scope-based access control
- Payload Signing: HMAC-SHA256 signatures for webhook payloads
- Rate Limiting: Per-IP and per-customer limits
- TLS Support: HTTPS for all external communications
webhook/
βββ cmd/ # Service entry points
β βββ api-gateway/
β βββ webhook-registry/
β βββ event-ingestion/
β βββ webhook-dispatcher/
β βββ retry-manager/
β βββ observability/
β βββ dlq/
βββ internal/ # Internal packages
β βββ gateway/ # API Gateway logic
β βββ registry/ # Webhook Registry logic
β βββ security/ # Authentication & authorization
β βββ observability/ # Metrics & logging
β βββ retry/ # Retry management
β βββ dlq/ # Dead letter queue management
βββ pkg/ # Shared packages
β βββ postgres/ # Database client
β βββ redis/ # Redis client
β βββ kafka/ # Kafka client
β βββ metrics/ # Prometheus metrics
β βββ circuitbreaker/ # Circuit breaker implementation
βββ proto/ # Protocol buffer definitions
βββ migrations/ # Database migrations
βββ docker/ # Docker configurations
βββ config/ # Configuration management
- Define Protocol Buffers: Add new RPC methods in
proto/
- Generate Code: Run
make proto-gen
- Implement Service: Add logic in
internal/
- Add Tests: Create unit and integration tests
- Update Documentation: Update API docs and README
# Run all tests
make test
# Run with coverage
go test -cover ./...
# Integration tests (requires running infrastructure)
go test -tags=integration ./tests/
Build and deploy with Docker Compose:
make docker-build
docker-compose up -d
Kubernetes manifests are available in the k8s/
directory:
kubectl apply -f k8s/
- Database: Use managed PostgreSQL (AWS RDS, GCP Cloud SQL)
- Message Queue: Use managed Kafka (AWS MSK, Confluent Cloud)
- Caching: Use managed Redis (AWS ElastiCache, GCP Memorystore)
- Monitoring: Set up alerting rules in Prometheus
- Security: Use proper secrets management (Vault, AWS Secrets Manager)
- Load Balancing: Use application load balancers
- Auto-scaling: Configure HPA based on metrics
- API Gateway: Scale based on HTTP request rate
- Event Ingestion: Scale based on Kafka producer lag
- Webhook Dispatcher: Scale based on Kafka consumer lag
- Retry Manager: Scale based on Redis queue size
- Adjust Kafka partition counts for parallelism
- Tune database connection pools
- Optimize HTTP client timeouts and connection limits
- Configure appropriate JVM settings for Kafka
-
Service Won't Start
- Check configuration file exists and is valid
- Verify database connectivity
- Check port availability
-
Webhooks Not Delivering
- Check Kafka connectivity
- Verify webhook URLs are accessible
- Check retry queue status
-
High Latency
- Monitor database query performance
- Check Kafka consumer lag
- Verify network connectivity
-
Delivery Logs Empty
- Check observability service connectivity
- Verify database schema matches code
- Check dispatcher observability integration
# Check service health
curl http://localhost:8080/health
# View service logs
docker-compose logs webhook-dispatcher
# Check Kafka topics
docker exec -it webhook-kafka kafka-topics --list --bootstrap-server localhost:9092
# Monitor metrics
curl http://localhost:9090/metrics
# Check delivery logs in database
docker-compose exec postgres psql -U postgres -d webhook_db -c "SELECT COUNT(*) FROM delivery_logs;"
POST /api/v1/webhooks
- Create webhookGET /api/v1/webhooks
- List webhooksGET /api/v1/webhooks/{id}
- Get webhookPUT /api/v1/webhooks/{id}
- Update webhookDELETE /api/v1/webhooks/{id}
- Delete webhook
POST /events
- Send event (public endpoint)
GET /health
- Health checkGET /metrics
- Prometheus metricsGET /api/v1/webhooks/{id}/logs
- Get delivery logsGET /api/v1/webhooks/{id}/stats
- Get delivery statistics
POST /auth/login
- Get JWT token
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
For support and questions:
- Create an issue in the GitHub repository
- Check the troubleshooting section
- Review the API documentation
Built with β€οΈ using Go, Kafka, PostgreSQL, and modern cloud-native technologies.