Design Discussion: Embedding AlertServer into dolphinscheduler-api Module #18005
Replies: 2 comments 1 reply
-
|
Hi all, Just a gentle follow-up on this design discussion. I wanted to check if anyone had a chance to review the proposed approach for embedding the AlertServer into the API Server. If there are any concerns regarding the leader election strategy, lifecycle integration, or alert processing model, I would really appreciate your feedback before proceeding further with the implementation. Thanks for your time and guidance. Best regards, |
Beta Was this translation helpful? Give feedback.
-
|
Been looking at the initialization flow too — main thing I'd worry about is lifecycle management when both services need to shut down gracefully. Did you consider keeping them separate but sharing the same datasource instead? That's what we did for another module and it avoided a ton of coupling issues. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I’ve been studying the architectural requirements for embedding the AlertServer into the API Server (related to #8975). After reviewing the initialization flows in
dolphinscheduler-alert-serveranddolphinscheduler-api, I’d like to discuss a potential design direction and gather feedback.My goal is to transition the alerting mechanism from a standalone process to an embedded background service while maintaining DolphinScheduler's high-availability and reliability standards.
Proposed Technical Direction
1. Logic Decoupling (Modularization)
Instead of source-code duplication, refactor the core alerting logic (e.g.,
AlertBootstrapService,AlertSender) into a reusable library module. Thedolphinscheduler-apiwill consume this as a dependency, ensuring a single source of truth for alerting logic.2. Lifecycle Integration
Use Spring-managed components and
@PostConstructhooks within the API Server to initialize the alerting engine. This ensures alerting threads are orchestrated alongside the API's primary lifecycle, starting only after the server successfully joins the Registry.3. Leader Election & High Availability (HA)
To prevent duplicate alert processing in horizontally scaled API deployments, I propose leveraging the existing
RegistryClient(ZooKeeper/Etcd) to implement a Leader-Follower model. Only the "Leader" API instance will activate theAlertEventLoop, with standby nodes ready to take over upon leader failure.4. Fault Tolerance & Data Integrity
UPDATE ... SET status = 'SENDING', handler_instance = 'ID' WHERE status = 'PENDING') to ensure thread-safe row acquisition.SENDINGstate due to unexpected instance crashes and reset them toPENDINGfor re-delivery.5. Performance Isolation
Configure a dedicated
ThreadPoolExecutorfor alerting tasks. This prevents long-running notification I/O (e.g., slow SMTP or Webhook responses) from starving the API's Netty/Tomcat worker threads, keeping the REST interface responsive.6. SPI Management & Decommissioning
Ensure the API Server remains compatible with the Alert SPI for dynamic plugin loading. This plan includes the complete removal of standalone
AlertServer.javaentry points, assembly descriptors, and redundant Docker/K8s service definitions to simplify the deployment footprint.I would appreciate any feedback or concerns regarding this approach, particularly on the distributed coordination strategy, before I proceed further with implementation planning.
Best regards,
Shrihari Rajendrakumar Kulkarni
Beta Was this translation helpful? Give feedback.
All reactions