-
Notifications
You must be signed in to change notification settings - Fork 60
Description
Is your feature request related to a problem?
Agent (sysmon_service.go): Re-use RingBuffer aggregation logic built in snmp/aggregator.go and stop streaming every Sysmon point. Calculate Avg/Min/Max over 60 seconds and push the aggregate.
Need to make sure we properly deal with this upstream, also upstream pipeline needs to change as well:
Currently we use ERTS (RPC) to send high-frequency (sysmon) updates to core-elx for processing. We should write these out immediately to NATS JetStream instead (put them in the events stream, new subject) and then write a new Broadway consumer runout of core-elx to process the sysmon messages.
- Refactor agent-gateway to drop sysmon, snmp, and custom metric payloads directly to NATS events stream subjects (e.g., metrics.sysmon.raw).
- Implement a Broadway topology in core-elx to consume metrics.sysmon.raw.
- Configure a dedicated ServiceRadar.TelemetryRepo (Ecto Repo) in core-elx with a strict connection pool (e.g., 30 conns) to prevent Broadway inserts from starving the Web UI.
- Edge Aggregation: Using the existing RingBuffer implementation in pkg/agent/core/ring.go, the agent will collect sysmon metrics every 1-5 seconds, but will only stream statistical rollups (Min, Max, Avg, P95) to the gateway every 60 seconds.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.