ES Not respecting Data_Warm Role, Hot Indices landing on Warm nodes #12953

jaypfe · 2024-05-07T14:56:55Z

jaypfe
May 7, 2024

Version

2.4.60

Installation Method

Security Onion ISO image

Description

configuration

Installation Type

Distributed

Location

on-prem with Internet access

Hardware Specs

Exceeds minimum requirements

CPU

60

RAM

128

Storage for /

256

Storage for /nsm

256

Network Traffic Collection

other (please provide detail below)

Network Traffic Speeds

Less than 1Gbps

Status

Yes, all services on all nodes are running OK

Salt Status

No, there are no failures

Logs

No, there are no additional clues

Detail

Hello again!

I am looking to change the layout of my grid and take advantage of Elastics Hot/Wam/Cold allocation across different nodes with different hardware specs. Specifically, to have my hot store on SSD with subsequent Warm/Cold on HDD. My overall architecture plan here is two have (2) Hot nodes and (2) warm nodes across different sites with a Replica of each, to ensure I could lose either site and be okay.

Ideally this would mean both the primary index and its replica would be stored on nodes that reflected its current position in the ILM.

I have a little lab environment where I have been testing this and the different ILM policies and how they reflect on the indices. I have noticed ES does not seem to respect the data_Warm role assigned to search nodes, and just seems to load balance the indices as each node were still Data_Hot. New Hot indices will open on Warm nodes, and Warm indices will still sit on Hot nodes.

Possibly this could be related to? #11062

Please see below for all the configs and the behavior I have noticed.

Firstly I have set the data_warm role within the SOC as shown:

This change appears to be effective from the result of this query to ES: GET /_cat/nodes?format=json&h=name,ip,role

I then set the ILM config in Kibana NOT in the SOC, just as a test. It seems to be holding the config, and not being overwritten by the Onion Side. I have the Hot rollover at 1 Day with Warm being 1 Day after that and Cold being 2 days.

I have also set the replica option to '1' within the SOC and replicas are generating as designed.

Looking at a particular index that is currently in the Warm stage (.ds-logs-fortinet_fortigate.log-default-2024.05.04-000002) I can see it ILM setting the preferred tier to data_warm, data_hot indicating it would attempt to give the warm nodes priority. But again that does not seem to be the case here.

Then looking where the shards are stored with: 'sudo so-elasticsearch-shards-list | grep fort'

You can see that same index with a primary shard actually sitting on a warm node but its replica sitting on a hot node. This is by chance only that the primary is on the warm node.

You can also observe .ds-logs-fortinet_fortigate.log-default-2024.05.05-000003 (which is currently hot) with its primary sitting on Search02 which has the Warm role assigned.

Looking at an example with 'sudo so-elasticsearch-query _cluster/allocation/explain -d '{ "index": ".ds-logs-fortinet_fortigate.log-default-2024.05.03-000001", "shard": 0, "primary": false }' | jq' It states it would be a 'worse balance' on a Warm Node. It acts like it has no respect for the role assignment at all.

I'm not sure if this is an ILM thing or what but I wanted to bring it up to you find gentleman. I am considering writing a script to look at the stage the index is in and moving it to the correct node, but this does seem bit redundant if this is supposed to be a default included feature with ES clustering. I have successfully moved shards manually between nodes without complaint from ES as well.

Guidelines

I have read the discussion guidelines at Read before posting! #1720 and assert that I have followed the guidelines.

Answered by TOoSmOotH

May 13, 2024

Adding data_content to your hot nodes has been added to the documentation.

View full answer

TOoSmOotH · 2024-05-07T17:55:21Z

TOoSmOotH
May 7, 2024
Maintainer

So we have this documented here: https://docs.securityonion.net/en/dev/elasticsearch.html#cluster which will update when .70 drops. Simply remove the data roles from the hot and warms and the data should move to the appropriate place. I would also make sure all data roles are removed from the manager.

1 reply

jaypfe May 7, 2024
Author

That did seem to do the trick!

I think its important to node though that it did break certain indices for me. Some are okay but I noticed some went to 'yellow' health right after the change.

It looks like they want the data_content tier and since no node no longer has that role its breaking this one. The primary is only okay because ES is 'allowing' it to stay on the node it was already on, but cannot be relocated.

This is case for all the following Indicies, looks like all the non-hidden ones

jaypfe · 2024-05-08T21:26:47Z

jaypfe
May 8, 2024
Author

After more trial and error, I don't believe you can safely remove the data role WITHOUT adding the data_content role from all nodes. It appears that the system indices such as (.fleet-policies-leader-7) etc are coded to use the 'data_content' role. Since no nodes have this role, it will remain on the node its currently on, but will never create a replica as no nodes have that role it requires.

Attempting to modify the index settings even with accounts that have the correct roles to do it, you are notified that you cannot modify system indices. So then as such, they appear to be forever in this limbo type of environment. In the case of a node failure that stores these indices, you can lose kibana for example.

Currently I have added the data_content role to 2 of my search nodes, I am awaiting to see the behavior. So far it looks like its working as it should, with ES not putting other indices EX like hot on warm nodes, and these system indices having replicas as they should. I have turned my Hot node off, and ES is not attempting to add another replica shard to those data nodes, which is good.

I let this run in this config with new hot indices being created daily and update to the behavior.

1 reply

TOoSmOotH May 13, 2024
Maintainer

Adding data_content to your hot nodes has been added to the documentation.

Answer selected by TOoSmOotH

petiepooo · 2024-05-13T19:34:37Z

petiepooo
May 13, 2024

It might be worth noting that nodes within a cluster are assumed to have reliable, low-latency (eg. site-local) networking between them. For replicating across longer distances, like between data centers, there is Cross Cluster Replication (CCR). If your goal is to distribute nodes into different physical locations, that might be the better solution for you.
https://www.elastic.co/blog/cross-datacenter-replication-with-elasticsearch-cross-cluster-replication
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/high-availability-cluster-design-large-clusters.html

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ES Not respecting Data_Warm Role, Hot Indices landing on Warm nodes #12953

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

ES Not respecting Data_Warm Role, Hot Indices landing on Warm nodes #12953

Uh oh!

Uh oh!

jaypfe May 7, 2024

Version

Installation Method

Description

Installation Type

Location

Hardware Specs

CPU

RAM

Storage for /

Storage for /nsm

Network Traffic Collection

Network Traffic Speeds

Status

Salt Status

Logs

Detail

Guidelines

Replies: 3 comments · 2 replies

Uh oh!

TOoSmOotH May 7, 2024 Maintainer

Uh oh!

Uh oh!

jaypfe May 7, 2024 Author

Uh oh!

jaypfe May 8, 2024 Author

Uh oh!

TOoSmOotH May 13, 2024 Maintainer

Uh oh!

petiepooo May 13, 2024

jaypfe
May 7, 2024

Replies: 3 comments 2 replies

TOoSmOotH
May 7, 2024
Maintainer

jaypfe May 7, 2024
Author

jaypfe
May 8, 2024
Author

TOoSmOotH May 13, 2024
Maintainer

petiepooo
May 13, 2024