-
Version2.4.100 Installation MethodSecurity Onion ISO image Descriptionother (please provide detail below) Installation TypeDistributed Locationon-prem with Internet access Hardware SpecsMeets minimum requirements CPU8 RAM16 Storage for /211GB Storage for /nsm427GB Network Traffic Collectionspan port Network Traffic Speeds1Gbps to 10Gbps StatusNo, one or more services are failed (please provide detail below) Salt StatusNo, there are no failures LogsNo, there are no additional clues DetailHello, I currently have 7 of 36 sensors reporting fault in our grid, however, I can ping, ssh, run salt commands to these nodes from the master without issue and the command so-status shows these sensors are in good working condition. I have changed the NTP configuration from the SOC and verified the change has passed down to all nodes, so NTP is not an issue any longer in our cluster nor should any local or network firewalls be preventing traffic to these affected nodes. I have also attempted to reboot the affected nodes to no avail. Below are the outputs from the SOC and So-status: I am unsure of what can be done to refresh the sensor's status in the grid. Please let me know what additional details I can provide as I'm sure I'm leaving out something. Guidelines
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
Try "so-telegraf-restart" and then run "so-checkin" and see if that works. If so-checkin is taking longer than it does normally, stop it, then run "salt-call saltutil.kill_all_jobs", then try so-checkin again. |
Beta Was this translation helpful? Give feedback.
-
The issue was with DNS, more specifically, connection to DNS. The master node had not previously had a name registered in DNS, and our nodes just used the entry built in /etc/hosts during setup. I gave the master node a DNS name in our internal DNS and added TLS/SSL certificates to it for a better end user experience. This change brought to light an issue in that some of our nodes are not able to connect to our internal DNS servers. This wasn't an issue until I made this change because SO added static names to the /etc/hosts file, so until now they never really needed to reach our DNS servers. I was able to prove this by adding the new DNS name as an alias in /etc/hosts and one of the problem nodes was then able to reconnect to the manager. (See screenshots below) |
Beta Was this translation helpful? Give feedback.
The issue was with DNS, more specifically, connection to DNS. The master node had not previously had a name registered in DNS, and our nodes just used the entry built in /etc/hosts during setup. I gave the master node a DNS name in our internal DNS and added TLS/SSL certificates to it for a better end user experience. This change brought to light an issue in that some of our nodes are not able to connect to our internal DNS servers. This wasn't an issue until I made this change because SO added static names to the /etc/hosts file, so until now they never really needed to reach our DNS servers. I was able to prove this by adding the new DNS name as an alias in /etc/hosts and one of the pro…