Skip to content

A server status monitoring application build with Node.js and RabbitMQ.

Notifications You must be signed in to change notification settings

abhaykantmishra/hms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Health Monitoring System

A distributed health monitoring system designed to handle high-concurrency checks with a non-blocking architecture.

Architecture & Design Decisions

High Level Design (HLD)

high level flow chart

This system isn't just a simple setInterval loop. It's built to scale. The core philosophy was to decouple the scheduling of checks from the execution of checks.

1. RabbitMQ (The Asynchronous Buffer)

Instead of the application trying to do everything at once (find due monitors -> check them -> save results), we use a Distributed Queue System.

  • Thinking Process:
    • The Problem: In a synchronous system, if 10,000 monitors are due at t=0, the event loop would block trying to fire 10,000 requests.
    • The Solution: Application-level flow control. The Scheduler only produces "jobs". The Workers consume them at their own pace.
  • Why RabbitMQ?:
    • Backpressure: It acts as a shock absorber. If the network is slow, the queue fills up, but the scheduler keeps ticking.
    • Worker Scalability: We can spin up 50 generic worker nodes on different servers, all listening to the same queue.
    • Reliability: If a worker crashes while processing a job, RabbitMQ can re-queue it (via Acknowledgements) so the check isn't lost.

2. Redis (The Atomic Scheduler)

We didn't want to scan the entire MongoDB health_monitors collection every second to find what's due. That's O(N) operation which degrades linearly as users add monitors.

  • Thinking Process:
    • State vs Stateless: The "schedule" is a stateful entity. We need fast random access and range queries.
    • Efficiency: Redis Sorted Sets (ZSET) allow us to store next_check_at as a score.
  • Mechanism:
    • O(log N) Polling: ZRANGEBYSCORE allows us to fetch only the monitors due right now without touching the millions of monitors scheduled for later.
    • Concurrency Safe: Redis operations are atomic, preventing race conditions if we were to scale the scheduler (with locking).

3. MongoDB (The Persistent Store)

  • Stores the configuration (HealthMonitor) and the historical results (HealthStatusCheck).
  • Trade-off: For now, we store time-series data (check results) in a standard document collection.
    • Future Upgrade: Move HealthStatusCheck to a dedicated Time-Series Database (like InfluxDB or TimescaleDB) for better compression and query performance on large datasets.

4. Database Design

  • high level flow chart

About

A server status monitoring application build with Node.js and RabbitMQ.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors