Health Monitoring System

A distributed health monitoring system designed to handle high-concurrency checks with a non-blocking architecture.

Architecture & Design Decisions

High Level Design (HLD)

This system isn't just a simple setInterval loop. It's built to scale. The core philosophy was to decouple the scheduling of checks from the execution of checks.

1. RabbitMQ (The Asynchronous Buffer)

Instead of the application trying to do everything at once (find due monitors -> check them -> save results), we use a Distributed Queue System.

Thinking Process:
- The Problem: In a synchronous system, if 10,000 monitors are due at t=0, the event loop would block trying to fire 10,000 requests.
- The Solution: Application-level flow control. The Scheduler only produces "jobs". The Workers consume them at their own pace.
Why RabbitMQ?:
- Backpressure: It acts as a shock absorber. If the network is slow, the queue fills up, but the scheduler keeps ticking.
- Worker Scalability: We can spin up 50 generic worker nodes on different servers, all listening to the same queue.
- Reliability: If a worker crashes while processing a job, RabbitMQ can re-queue it (via Acknowledgements) so the check isn't lost.

2. Redis (The Atomic Scheduler)

We didn't want to scan the entire MongoDB health_monitors collection every second to find what's due. That's O(N) operation which degrades linearly as users add monitors.

Thinking Process:
- State vs Stateless: The "schedule" is a stateful entity. We need fast random access and range queries.
- Efficiency: Redis Sorted Sets (ZSET) allow us to store next_check_at as a score.
Mechanism:
- O(log N) Polling: ZRANGEBYSCORE allows us to fetch only the monitors due right now without touching the millions of monitors scheduled for later.
- Concurrency Safe: Redis operations are atomic, preventing race conditions if we were to scale the scheduler (with locking).

3. MongoDB (The Persistent Store)

Stores the configuration (HealthMonitor) and the historical results (HealthStatusCheck).
Trade-off: For now, we store time-series data (check results) in a standard document collection.
- Future Upgrade: Move HealthStatusCheck to a dedicated Time-Series Database (like InfluxDB or TimescaleDB) for better compression and query performance on large datasets.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
backend		backend
frontend		frontend
public/assets		public/assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Health Monitoring System

Architecture & Design Decisions

High Level Design (HLD)

1. RabbitMQ (The Asynchronous Buffer)

2. Redis (The Atomic Scheduler)

3. MongoDB (The Persistent Store)

4. Database Design

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

abhaykantmishra/hms

Folders and files

Latest commit

History

Repository files navigation

Health Monitoring System

Architecture & Design Decisions

High Level Design (HLD)

1. RabbitMQ (The Asynchronous Buffer)

2. Redis (The Atomic Scheduler)

3. MongoDB (The Persistent Store)

4. Database Design

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages