- 
                Notifications
    
You must be signed in to change notification settings  - Fork 0
 
Description
Here is a professional GitHub issue draft to add SNMP ping for switch health with optional port up/down checks, aligned to the existing probe engine, outage flow, and YAML configuration patterns.[1][2]
Title
Add SNMP ping (switch health, port up/down)[2]
Background
Pulse currently supports ICMP, TCP, and HTTP reachability, and extending to SNMP aligns with the plant-aware objective while staying within the availability-only scope for v1.0.[2]
The new SNMP probe must plug into the existing ProbeService → OutageDetectionService pipeline, reuse timeout semantics, record RTT, and surface consistently in API, CSV, and live board.[3][1]
Objective
Implement a new probe type snmp that reports UP when an SNMP GET succeeds within timeout (defaulting to sysUpTime.0), records RTT, and optionally evaluates a configured OID for port status to annotate health while keeping the primary outcome availability-focused.[1][2]
Scope
- Backend probe: SNMP reachability via UDP/161 using a minimal GET on a default OID (sysUpTime.0), with configurable version (v1/v2c), community, timeout, and retries, mapping success/failure to UP/DOWN with RTT as request roundtrip.[1]
 - Optional port check: allow an OID parameter (e.g., ifOperStatus for a specific interface index) to be fetched after reachability; availability remains based on SNMP reachability while port status is recorded in the result metadata for UI display.[1]
 - YAML schema: add type: snmp with host, port (default 161), version, community, timeout, retries, and optional oid/expectedValue for port-state hints; validate and surface in Apply/diff/versioning.[4]
 
Non-goals
- No deep device inventory, traps, or bulk walks; this is a lightweight reachability/health check appropriate for v1 availability scope.[2]
 - No SNMPv3 security profiles in this phase; start with v1/v2c to minimize complexity and configuration burden.[2]
 
Design Notes
- Result mapping: Success = UP with RTT measured as GET roundtrip; Failure = DOWN on timeout, no response, or auth error, with granular error categories for observability.[1]
 - Timeouts/retries: Adopt existing per-probe timeout and retry semantics; wire through cancellation tokens and error paths consistent with other probes.[1]
 - Outage flow: Feed CheckResult into OutageDetectionService unchanged to honor 2/2 flap damping and transactional outage open/close behavior.[3]
 
Tasks
- Backend
- Implement SnmpPingProbe with reachability GET to default OID and RTT capture, plus optional fetch of a configured OID for port status annotation.[1]
 - Extend ProbeService.ProbeAsync to route type: snmp and produce standardized CheckResult with error categorization (timeout, noResponse, authError).[3]
 - Add unit tests for success, timeout, no response, bad community, and optional port OID resolution; add an integration test using a mock SNMP agent.[1]
 
 - Configuration & Apply
- Update config.schema.json to include enum value snmp with properties: host, port, version (v1/v2c), community, timeout, retries, and optional oid/expectedValue.[4]
 - Extend ConfigurationParser validations and Apply diff to show additions/changes and preserve version snapshots and warnings for invalid parameter combinations.[4]
 
 - API/UI
- Ensure API DTOs and CSV export include probe type snmp, RTT, and optional portStatus metadata without changing outage semantics.[5]
 - Update Configuration editor to add SNMP fields with inline validation and help text, and label SNMP endpoints distinctly in live board and detail pages.[4]
 
 - Docs
- Add docs examples for snmp endpoints, defaults, version/community notes, optional port status OID, and firewall/UDP considerations.[5]
 - Note performance expectations for SNMP RTT and error categorization in probes-spec.md.[1]
 
 
Acceptance Criteria
- A YAML endpoint with type: snmp applies cleanly, appears in diff/versioning, and is visible/editable in the UI with sensible defaults and validations.[4]
 - SNMP endpoints report UP when a GET completes within timeout and DOWN on timeout/no response/auth error, with RTT populated and errors categorized.[1]
 - Outage transitions for SNMP respect 2/2 flap damping and persist open/close events as with other probe types.[3]
 - API and CSV show probe type snmp and RTT, and UI clearly distinguishes SNMP endpoints and optionally displays port status if configured.[5]
 
Risks & Mitigations
- Variability across vendors and MIBs for port OIDs; default to sysUpTime.0 for reachability and document port-OID as optional.[1]
 - UDP filtering or rate-limiting in OT networks; provide clear error categorization and operator guidance in docs and UI.[1]
 
Testing Plan
- Unit tests for SnmpPingProbe covering success/failure modes with deterministic timings and cancellations.[1]
 - Integration test against a mock agent to validate port OID flow and RTT reporting, plus E2E from YAML apply → probe execution → outage transitions → API/CSV verification.
 
References
- Probe semantics and budgets to mirror: probes-spec.md.
 - Flow integration and execution boundaries: outage-probe-flow-analysis.md.
 - Configuration/Apply/versioning architecture and file layout: Configuration.md.
 - API/doc surfacing and examples: README.md.
 - Scope guardrails and availability-only emphasis: scope-v1.md.