|
| 1 | +--- |
| 2 | +draft: true |
| 3 | +title: SNMP Traps with coshsh |
| 4 | +tags: |
| 5 | + - snmp |
| 6 | + - traps |
| 7 | +weight: 400 |
| 8 | +--- |
| 9 | +SNMP Traps and Nagios (or any other related systems) is one of those topics that people have often preferred to avoid. Fundamentally, there have been two add-ons available for years, SNMPTT and Nagtrap, but their configuration is somewhat tedious. In a project involving the monitoring of several thousand storage systems, a method was developed that is resource-efficient and easily automatable. |
| 10 | + |
| 11 | +Let's start with a picture. A network device sends traps to the OMD server. Here we have a process [samplicate](https://github.com/sleinen/samplicator) listening on port 162 which duplicates the udp packets and forwards them to the OMD sites which are configured as trap recipients, Such sites have an *snmptrapd* process, which writes the contents of an incoming trap in a logfile, *traps.log*. (we see three traps here, arriving at port 162 in the order blue, green, red. In the same order they arrive at the sites' snmptrap daemons and in the same order they are written to the logfile) |
| 12 | +That's all. Later you will learn how the logfile is scanned for incoming traps, how these are evaluated, how the trap sender is identified among the monitored host objects and how the right service is set into an alarm state. |
| 13 | + |
| 14 | + |
| 15 | + |
| 16 | + |
| 17 | +## Setup trap processing on an OMD server |
| 18 | +The first part of this article focuses on preparing an OMD server. Specifically, it explains how to ensure that an incoming trap is simultaneously forwarded to multiple OMD sites (e.g., testing, production, etc.). |
| 19 | + |
| 20 | +First, any existing snmptrapd process must be stopped, and its associated init script or systemd service must be removed. With most distributions this is achieved by running the following commands as the root user: |
| 21 | +```bash |
| 22 | +[root@pxmxmon]# systemctl stop snmptrapd |
| 23 | +[root@pxmxmon]# systemctl disable snmptrapd |
| 24 | +``` |
| 25 | + A samplicate daemon will take over listening on *udp/162*. The Samplicator project is designed to duplicate UDP packets. When a trap arrives, it is forwarded to local snmptrapd processes running within OMD sites, each of which listens on its own dedicated high port. |
| 26 | + |
| 27 | +A site can be prepared to receive traps using the following command: |
| 28 | +```bash |
| 29 | +OMD[demo@pxmxmon]:~$ omd config set SNMPTRAPD on |
| 30 | +``` |
| 31 | + |
| 32 | +OMD automatically assigns a port: |
| 33 | + |
| 34 | +```bash |
| 35 | +OMD[demo@pxmxmon]:~$ omd config show SNMPTRAPD_UDP_PORT |
| 36 | +9162 |
| 37 | +``` |
| 38 | + |
| 39 | +The first site gets port 9162, and subsequent sites are assigned incrementally higher ports. You don’t need to worry about this numbering; OMD manages it automatically in the background. After restarting the site or running: |
| 40 | + |
| 41 | +```bash |
| 42 | +OMD[demo@pxmxmon]:~$ omd start snmptrapd |
| 43 | +``` |
| 44 | + |
| 45 | +...now there is a snmptrapd runninging the OMD site listening on udp port 9162. |
| 46 | + |
| 47 | +## Setting up the system trap forwarder |
| 48 | +Next, the samplicate daemon must be started. A manual step is required because installing OMD as a dependency also installs an snmptrapd package. On some distributions, this triggers the automatic start of the snmptrapd daemon, which is undesired. To ensure the samplicate daemon is running instead (and listening on port udp/162), execute the following commands as the root user: |
| 49 | + |
| 50 | +```bash |
| 51 | +[root@pxmxmon]$ /bin/cp /opt/omd/versions/default/share/samplicate/*.service /etc/systemd/system |
| 52 | +[root@pxmxmon]$ systemctl enable samplicate_watch |
| 53 | +[root@pxmxmon]$ systemctl start samplicate_watch |
| 54 | +``` |
| 55 | + |
| 56 | +This installs two services: |
| 57 | + |
| 58 | +* **samplicate_watch**: Checks every 60 seconds which OMD sites are running a local snmptrapd listening on which port. If there is a change, it ensures the samplicate service restarts with an updated list of target ports. |
| 59 | +* **samplicate**: The service that forwards copies of incoming UDP packets to the local recipients. |
| 60 | + |
| 61 | +What happens when an SNMP trap is received on the OMD server, such as the following example? |
| 62 | + |
| 63 | +```bash |
| 64 | +# pxmxmon is the hostname of your monitoring server |
| 65 | +$ snmptrap -v 2c -c public pxmxmon \ |
| 66 | + '' 1.3.6.1.4.1.8072.2.3.0.1 1.3.6.1.4.1.8072.2.3.2.1 i 12341234 |
| 67 | +``` |
| 68 | +The samplicate process will receive the udp packet containing the snmp trap. It will immediately forward a copy of the packet to every target port (aka. local snmptrapd). |
| 69 | +The local (running inside the OMD site(s)) snmptrapd processes are configured to write the trap to a log file located at *$OMD_ROOT/var/log/snmp/traps.log*. Each field is logged on its own line, followed by a summary line that flattens the trap. Here the individual fields are separated by four underscores (____). |
| 70 | + |
| 71 | +For example: |
| 72 | +``` |
| 73 | +[Tue Jan 21 03:19:53 PM CET 2025] .1.3.6.1.2.1.1.3.0: 42095560 |
| 74 | +[Tue Jan 21 03:19:53 PM CET 2025] .1.3.6.1.6.3.1.1.4.1.0: .1.3.6.1.4.1.8072.2.3.0.1 |
| 75 | +[Tue Jan 21 03:19:53 PM CET 2025] .1.3.6.1.4.1.8072.2.3.2.1: 12341234 |
| 76 | +[Tue Jan 21 03:19:53 PM CET 2025] summary: ____lausser.consol.de____UDP: [10.1.18.166]:34300->[127.0.0.1]:9162____.1.3.6.1.2.1.1.3.0 42095560____.1.3.6.1.6.3.1.1.4.1.0 .1.3.6.1.4.1.8072.2.3.0.1____.1.3.6.1.4.1.8072.2.3.2.1 12341234 |
| 77 | +``` |
| 78 | +There is also the resolved hostname of the sender node and, inside the first pair of brackets, it's ip address. We will later use this information to identify the host object in the monitoring configs. |
| 79 | + |
| 80 | +> **Note:** If you do not see these lines in your trap.log, then maybe a firewall is blocking the snmp packets from your system. You can repeat the **snmptrap** command and at the same time listen on the network interface of your monitoring server with **tcpdump port 162**. |
| 81 | +If you still can't see anything, then open your firewall. For example on a Rocky Linux system you do it with: |
| 82 | +> ```bash |
| 83 | +> [root@pxmxmon]$ firewall-cmd --permanent --add-port=162/udp |
| 84 | +> [root@pxmxmon]$ firewall-cmd --reload |
| 85 | +> ``` |
| 86 | +
|
| 87 | +In this first test the trap *netSnmpExampleHeartbeatNotification (.1.3.6.1.4.1.8072.2.3.0.1)* from the *NET-SNMP-EXAMPLES-MIB* was sent. For now, we will stick with this MIB before things get complicated. |
| 88 | +In the next part, we will set up everything needed to add a host with a single service to the monitoring system. This service will represent the trap netSnmpExampleHeartbeatNotification and will turn CRITICAL whenever this trap is received. |
| 89 | +Technically, this means that the service is configured as a passive service. Meanwhile, the traps.log file is scanned every minute for new entries using the check_logfiles plugin. If a trap is found with a sender IP address matching the host's address in the monitoring configuration and a trap OID corresponding to netSnmpExampleHeartbeatNotification, the check_logfiles plugin submits a CRITICAL passive check result for the service. |
| 90 | +
|
| 91 | +Now, we’ll demonstrate how to use coshsh and a MIB file to automatically generate trap services. |
| 92 | +
|
| 93 | +
|
| 94 | +## Preparing MIB files for use by coshsh |
| 95 | +What we want to achieve is to have coshsh create a service definition (a passive one) for every kind of trap we want to monitor for a certain host or device category. |
| 96 | +
|
| 97 | +### Convert MIB to SNMPTT |
| 98 | +First we need the command **snmpttconvertmib**, which is part of the *snmptt* suite. Either *snmptt* is available from your distribution's repositories, then install it with **yum** or **apt**. Or you simply download the command from github. |
| 99 | +
|
| 100 | +```bash |
| 101 | +OMD[demo@pxmxmon]:~$ cd local/bin |
| 102 | +OMD[demo@pxmxmon]:~$ curl -O https://raw.githubusercontent.com/snmptt/snmptt/refs/heads/master/snmptt/snmpttconvertmib |
| 103 | +OMD[demo@pxmxmon]:~$ chmod 755 snmpttconvertmib |
| 104 | +``` |
| 105 | +
|
| 106 | +```bash |
| 107 | +OMD[demo@pxmxmon]:~$ cd etc/coshsh/data/mibs/ |
| 108 | +OMD[demo@pxmxmon]:~$ mkdir ../snmptt |
| 109 | +OMD[demo@pxmxmon]:~$ OMD[demo@pxmxmon]:~/etc/coshsh/data/mibs$ snmpttconvertmib --in=NET-SNMP-EXAMPLES-MIB.txt --out=../NET-SNMP-EXAMPLES-MIB.snmptt |
| 110 | +
|
| 111 | +***** Processing MIB file ***** |
| 112 | +... |
| 113 | +NOTIFICATION-TYPE: netSnmpExampleHeartbeatNotification |
| 114 | +Variables: netSnmpExampleHeartbeatRate |
| 115 | +Enterprise: netSnmpExampleNotificationPrefix |
| 116 | +Looking up via snmptranslate: NET-SNMP-EXAMPLES-MIB::netSnmpExampleHeartbeatNotification |
| 117 | +OID: .1.3.6.1.4.1.8072.2.3.0.1 |
| 118 | +
|
| 119 | +Done |
| 120 | +
|
| 121 | +Total translations: 1 |
| 122 | +Successful translations: 1 |
| 123 | +Failed translations: 0 |
| 124 | +``` |
| 125 | +<!-- |
| 126 | +SORRY FROM HERE ON THERE IS WORK-IN-PROGRESS BECAUSE I GO HOME NOW AND CONTINUE FROM THERE |
| 127 | +
|
| 128 | +As an example we will process traps coming from a SuperMicro server. On the website of the management board, where you enter the snmp trap receiver (which is the ip address of your monitoring server) there is a button *send test trap*. Push this button and you should see a few lines added to your file *traps.log*. (If you don't have such a server, you can use a printer or any other snmp-enabled device. Just pull a plug, open a lid, etc. so that the device will send an snmp trap) |
| 129 | +
|
| 130 | +``` |
| 131 | +[Tue Jan 21 04:44:47 PM CET 2025] .1.3.6.1.2.1.1.3.0: 1737474286 |
| 132 | +[Tue Jan 21 04:44:47 PM CET 2025] .1.3.6.1.6.3.1.1.4.1.0: .1.3.6.1.4.1.3183.1.1.0.0 |
| 133 | +[Tue Jan 21 04:44:47 PM CET 2025] .1.3.6.1.4.1.3183.1.1.1: "43 30 30 31 4D 53 AC 1F 6B B7 30 44 00 00 00 00 |
| 134 | +[Tue Jan 21 04:44:47 PM CET 2025] 00: 04 32 E4 E4 6E FF FF 20 20 02 00 00 00 00 00 |
| 135 | +[Tue Jan 21 04:44:47 PM CET 2025] 00: 00 00 00 00 00 00 19 7C 2A 00 00 34 08 80 00 |
| 136 | +[Tue Jan 21 04:44:47 PM CET 2025] 01: 00 C1 " |
| 137 | +[Tue Jan 21 04:44:47 PM CET 2025] .1.3.6.1.6.3.18.1.3.0: 10.0.10.118 |
| 138 | +[Tue Jan 21 04:44:47 PM CET 2025] .1.3.6.1.6.3.18.1.4.0: "public" |
| 139 | +[Tue Jan 21 04:44:47 PM CET 2025] .1.3.6.1.6.3.1.1.4.3.0: .1.3.6.1.4.1.3183.1.1 |
| 140 | +[Tue Jan 21 04:44:47 PM CET 2025] summary: ____proxmox18.consol.de____UDP: [10.0.10.118]:53176->[127.0.0.1]:9162____.1.3.6.1.2.1.1.3.0 1737474286____.1.3.6.1.6.3.1.1.4.1.0 .1.3.6.1.4.1.3183.1.1.0.0____.1.3.6.1.4.1.3183.1.1.1 "43 30 30 31 4D 53 AC 1F 6B B7 30 44 00 00 00 00____00 04 32 E4 E4 6E FF FF 20 20 02 00 00 00 00 00____00 00 00 00 00 00 00 19 7C 2A 00 00 34 08 80 00____01 00 C1 "____.1.3.6.1.6.3.18.1.3.0 10.0.10.118____.1.3.6.1.6.3.18.1.4.0 "public"____.1.3.6.1.6.3.1.1.4.3.0 .1.3.6.1.4.1.3183.1.1 |
| 141 | +``` |
| 142 | +The interesting part is the second line. It gives us the OID of the trap we just received: .1.3.6.1.4.1.3183.1.1.0.0 |
| 143 | +We must now find the MIB file, where this OID belongs to. When we google for it, we find multiple MIBs. It looks like this OID is part of *Server Administrator BMC MIB* which is used and renamed by more than one vendor. We just chose this one and download it to the folder *~/etc/coshhs/data/mibs*: [DELL-ASF-MIB](https://github.com/librenms/librenms-mibs/blob/master/DELL-ASF-MIB) |
| 144 | +Now we need to convert this mib to a format which can be processed by coshsh. |
| 145 | +
|
| 146 | +```bash |
| 147 | +OMD[demo@pxmxmon]:~$ cd etc/coshsh/data/mibs/ |
| 148 | +OMD[demo@pxmxmon]:~$ mkdir ../snmptt |
| 149 | +OMD[demo@pxmxmon]:~$ snmpttconvertmib --in=DELL-ASF-MIB --out=../snmptt/DELL-ASF-MIB.snmptt |
| 150 | +
|
| 151 | +
|
| 152 | +***** Processing MIB file ***** |
| 153 | +
|
| 154 | +snmptranslate version: NET-SNMP version: 5.9.1 |
| 155 | +severity: Normal |
| 156 | +
|
| 157 | +File to load is: ./DELL-ASF-MIB |
| 158 | +File to APPEND TO: ../snmptt/DELL-ASF-MIB.snmptt |
| 159 | +
|
| 160 | +MIBS environment var: ./DELL-ASF-MIB |
| 161 | +MIB name: DELL-ASF-MIB |
| 162 | +
|
| 163 | +
|
| 164 | +Processing MIB: DELL-ASF-MIB |
| 165 | +# |
| 166 | +skipping a TRAP-TYPE / NOTIFICATION-TYPE line - probably an import line. |
| 167 | +# |
| 168 | +Line: 36 |
| 169 | +TRAP-TYPE: asfTrapIPMIAlertTest |
| 170 | +Looking up via snmptranslate: DELL-ASF-MIB::asfTrapIPMIAlertTest |
| 171 | +OID: .1.3.6.1.4.1.3183.1.1.0.1001 |
| 172 | +
|
| 173 | +... |
| 174 | +
|
| 175 | +# |
| 176 | +Line: 1159 |
| 177 | +TRAP-TYPE: asfTrapInternalDualSDModuleNotRedundant |
| 178 | +Looking up via snmptranslate: DELL-ASF-MIB::asfTrapInternalDualSDModuleNotRedundant |
| 179 | +OID: .1.3.6.1.4.1.3183.1.1.0.13175555 |
| 180 | +
|
| 181 | +
|
| 182 | +Done |
| 183 | +
|
| 184 | +Total translations: 91 |
| 185 | +Successful translations: 91 |
| 186 | +Failed translations: 0 |
| 187 | +OMD[demo@pxmxmon]:~/etc/coshsh/data/mibs$ |
| 188 | +``` |
| 189 | +
|
| 190 | +If you see IMPORT statements in the MIB or error messages in snmpttconvertmib's output, then it is probably necessary to add somemore MIB files, until everything can be properly resolved. |
| 191 | +The result of this operation is now a file *~/etc/coshsh/data/snmptt/DELL-ASF-MIB.snmptt*. It contains a section for every trap which was found inthe MIB file. |
| 192 | +
|
| 193 | +``` |
| 194 | +... |
| 195 | +EVENT asfTrapFanSpeedWarning .1.3.6.1.4.1.3183.1.1.0.262400 "Status Events" MINOR |
| 196 | +FORMAT Generic Predictive Fan Failure ( predictive failure asserted ) |
| 197 | +SDESC |
| 198 | +
|
| 199 | +Generic Predictive Fan Failure ( predictive failure asserted ) |
| 200 | +Variables: |
| 201 | +EDESC |
| 202 | +# |
| 203 | +# |
| 204 | +# |
| 205 | +EVENT asfTrapFanSPeedWarningCleared .1.3.6.1.4.1.3183.1.1.0.262528 "Status Events" INFORMATIONAL |
| 206 | +FORMAT Generic Predictive Fan Failure Cleared |
| 207 | +SDESC |
| 208 | +
|
| 209 | +Generic Predictive Fan Failure Cleared |
| 210 | +Variables: |
| 211 | +EDESC |
| 212 | +# |
| 213 | +# |
| 214 | +# |
| 215 | +EVENT asfTrapFanSpeedProblem .1.3.6.1.4.1.3183.1.1.0.262402 "Status Events" CRITICAL |
| 216 | +FORMAT Generic Critical Fan Failure |
| 217 | +SDESC |
| 218 | +
|
| 219 | +Generic Critical Fan Failure |
| 220 | +Variables: |
| 221 | +EDESC |
| 222 | +... |
| 223 | +``` |
| 224 | +
|
| 225 | +What is the meaning of the keywords? |
| 226 | +* EVENT - this is the name of the trap, its OID, a fixed string "Status Events" and, very important, the severity. The trap name together with the MIB name will be the base of the future service_description. |
| 227 | +* FORMAT - a short description. This will later be used in the notifications. |
| 228 | +* SDESC/EDESC - start and end of a longer description. |
| 229 | +
|
| 230 | +In this case the MIB contained severity hints. Usually this is not the case and all the EVENT lines will get the severity *NORMAL* first. |
| 231 | +
|
| 232 | +--> |
0 commit comments