Shards failing and now logstash and elasticsearch will not start #13059

sikoraj · 2024-05-22T12:23:36Z

sikoraj
May 22, 2024

Version

2.4.60

Installation Method

Security Onion ISO image

Description

configuration

Installation Type

Distributed

Location

on-prem with Internet access

Hardware Specs

Exceeds minimum requirements

CPU

Varies depending on the node; minimum of 4

RAM

Again varies depending on the node, minimum of 16 GB

Storage for /

Varies, minimum of 128 GB

Storage for /nsm

Varies heavily, between 500 GB and 70 TB

Network Traffic Collection

span port

Network Traffic Speeds

more than 10Gbps

Status

No, one or more services are failed (please provide detail below)

Salt Status

No, there are no failures

Logs

Yes, there are additional clues in /opt/so/log/ (please provide detail below)

Detail

We have been working on building out a new distributed 2.4 install. We have VMs for the manager node and two search nodes and we have physical servers for two forward sensor nodes and one physical search node box. We connected our first forward node to capture traffic from our datacenter to the rest of our campus and everything seemed to work fine for about a week.

Then we started to get faults on the manager node and the error "The search query encountered a failure within the Elasticsearch cluster. " when trying to look at alerts. Looking at the logs and in Kibana under Stack Management we found that we had shards failing and that some Zeek indices from the forward node were showing red under health. We tried deleting the indices that were having difficulty, and that fixed the issue temporarily. Shortly after the indices would show red and the shards would fail again.

Now after rebooting the entire system the forward node is now also showing a fault and that elasticsearch and logstash are missing. Trying to restart elasticsearch and logstash keeps failing.

Guidelines

I have read the discussion guidelines at Read before posting! #1720 and assert that I have followed the guidelines.

Answered by reyesj2

May 29, 2024

On sostorep1 can you run the command
test -d /nsm && echo "correct" | echo "incorrect" /nsm should be a directory and it looks like currently it might exist as a file ? If the output says "incorrect" try removing the file

On sostorev1 it looks like it might be having problems communicating with the manager check that it still has access to the manager. You can try running nc -zv <managerip> 4505 & nc -zv <managerip> 4506 Here is a list of ports that all nodes should back to the manager https://docs.securityonion.net/en/2.4/firewall.html#node-communication

View full answer

reyesj2 · 2024-05-23T13:07:07Z

reyesj2
May 23, 2024
Maintainer

From the manager can you give us the output from something like
sudo salt -C 'G@role:so-searchnode or G@role:so-sensor' state.highstate -linfo

0 replies

sikoraj · 2024-05-28T18:13:31Z

sikoraj
May 28, 2024
Author

Thanks!
I have attached the results to this post
results.txt

2 replies

reyesj2 May 29, 2024
Maintainer

On sostorep1 can you run the command
test -d /nsm && echo "correct" | echo "incorrect" /nsm should be a directory and it looks like currently it might exist as a file ? If the output says "incorrect" try removing the file

On sostorev1 it looks like it might be having problems communicating with the manager check that it still has access to the manager. You can try running nc -zv <managerip> 4505 & nc -zv <managerip> 4506 Here is a list of ports that all nodes should back to the manager https://docs.securityonion.net/en/2.4/firewall.html#node-communication

Answer selected by sikoraj

sikoraj Jun 25, 2024
Author

Thank you for the help!

Running this revealed that the file system for nsm on our one search node was corrupted.

We worked with one of our sys admins and we were able to run xfs_repair several times with different options in order to repair it. Once repaired and rebooted it came back up and has now been working.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Shards failing and now logstash and elasticsearch will not start #13059

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Shards failing and now logstash and elasticsearch will not start #13059

Uh oh!

sikoraj May 22, 2024

Version

Installation Method

Description

Installation Type

Location

Hardware Specs

CPU

RAM

Storage for /

Storage for /nsm

Network Traffic Collection

Network Traffic Speeds

Status

Salt Status

Logs

Detail

Guidelines

Replies: 2 comments · 2 replies

Uh oh!

reyesj2 May 23, 2024 Maintainer

Uh oh!

sikoraj May 28, 2024 Author

Uh oh!

reyesj2 May 29, 2024 Maintainer

Uh oh!

sikoraj Jun 25, 2024 Author

sikoraj
May 22, 2024

Replies: 2 comments 2 replies

reyesj2
May 23, 2024
Maintainer

sikoraj
May 28, 2024
Author

reyesj2 May 29, 2024
Maintainer

sikoraj Jun 25, 2024
Author