-
Notifications
You must be signed in to change notification settings - Fork 595
Description
Hi all, I hope you folks are alright
Describe the bug
I have a quite weird bug here. I have an HA setup with 2 masters and 2 satellites per zone, using top/down config sync. When one master's icingadb is shut down (or the whole server is down), a random satellite is elect as the new icinga2 active endpoint. This behavior is also noticable when both masters are down (at least both icingadb). We are using docker, controlled with a systemd service.
To Reproduce
- Stop the active endpoint icingadb service (or container) while keeping the second master up
- You will see on the health status page that the active endpoint was switched to a satellite
To mitigate this I have to shut down every zone satellites (at least the icingadb component), re-start both masters icingadb component, then restart every satellites icingadb component. This kinda defeat the whole purpose of having HA.
Let me know if you need more infos. Thanks
Expected behavior
The second master is elected as active endpoint instead of a random satellite one.
Your Environment
Version used (icinga2 --version) : 2.15.1
Operating System and version : Debian 12 - Docker 28.5.1
Enabled features (icinga2 feature list) : api checker icingadb mainlog notification opsgenie syslog
Icinga Web 2 version and modules (System - About) 2.16.6
Config validation (icinga2 daemon -C)
[2025-12-04 14:30:29 +0000] information/cli: Icinga application loader (version: v2.15.1)
[2025-12-04 14:30:29 +0000] information/cli: Loading configuration file(s).
[2025-12-04 14:30:29 +0000] information/ConfigItem: Committing config item(s).
[2025-12-04 14:30:29 +0000] information/ApiListener: My API identity: ``icinga-master-1.exemple.com
[2025-12-04 14:30:29 +0000] information/ConfigItem: Instantiated 1 SyslogLogger.
[2025-12-04 14:30:29 +0000] information/ConfigItem: Instantiated 1 NotificationComponent.
[2025-12-04 14:30:29 +0000] information/ConfigItem: Instantiated 1 CheckerComponent.
[2025-12-04 14:30:29 +0000] information/ConfigItem: Instantiated 1 User.
[2025-12-04 14:30:29 +0000] information/ConfigItem: Instantiated 1 UserGroup.
[2025-12-04 14:30:29 +0000] information/ConfigItem: Instantiated 3 TimePeriods.
[2025-12-04 14:30:29 +0000] information/ConfigItem: Instantiated 1 ServiceGroup.
[2025-12-04 14:30:29 +0000] information/ConfigItem: Instantiated 2981 Services.
[2025-12-04 14:30:29 +0000] information/ConfigItem: Instantiated 2 ScheduledDowntimes.
[2025-12-04 14:30:29 +0000] information/ConfigItem: Instantiated 136 Zones.
[2025-12-04 14:30:29 +0000] information/ConfigItem: Instantiated 12 Notifications.
[2025-12-04 14:30:29 +0000] information/ConfigItem: Instantiated 2 NotificationCommands.
[2025-12-04 14:30:29 +0000] information/ConfigItem: Instantiated 1 FileLogger.
[2025-12-04 14:30:29 +0000] information/ConfigItem: Instantiated 1 IcingaApplication.
[2025-12-04 14:30:29 +0000] information/ConfigItem: Instantiated 1452 Hosts.
[2025-12-04 14:30:29 +0000] information/ConfigItem: Instantiated 135 HostGroups.
[2025-12-04 14:30:29 +0000] information/ConfigItem: Instantiated 2 Downtimes.
[2025-12-04 14:30:29 +0000] information/ConfigItem: Instantiated 266 Endpoints.
[2025-12-04 14:30:29 +0000] information/ConfigItem: Instantiated 2 ApiUsers.
[2025-12-04 14:30:29 +0000] information/ConfigItem: Instantiated 1 ApiListener.
[2025-12-04 14:30:29 +0000] information/ConfigItem: Instantiated 252 CheckCommands.
[2025-12-04 14:30:29 +0000] information/ConfigItem: Instantiated 1 IcingaDB.
[2025-12-04 14:30:29 +0000] information/ScriptGlobal: Dumping variables to file ‘/var/cache/icinga2/icinga2.vars’
[2025-12-04 14:30:29 +0000] information/cli: Finished validating the configuration file(s).
Additional context
We are using containerized versions of icinga based on the sources (regularly updated)
Here is my zone files, constants.conf, and more, anonymized and redacted :
`icinga-master-1.example.com ~ # cat /data/icinga/data/etc/icinga2/zones.conf
/*
Generated by puppet
on 2025-12-04 14:36:44 +0000
*/
object Endpoint “icinga-master-1.example.com” {
}
object Endpoint “icinga-master-2.example.com” {
}
[…]
object Endpoint “infra-1.zone1.example.com” {
}
object Endpoint “infra-2.zone1.example.com” {
}
[…]
object Endpoint “infra-1.zone2.example.com” {
}
object Endpoint “infra-2.zone2.example.com” {
}
[…]
object Zone “zone1.example.com” {
endpoints = [ “infra-1.zone1.example.com”, “infra-2.zone1.example.com” ]
parent = “master”
}
[…]
object Zone “zone2.example.com” {
endpoints = [ “infra-1.zone2.example.com”, “infra-2.zone2.example.com” ]
parent = “master”
}
[…]
object Zone “global-templates” {
global = true
}
object Zone “global-commands” {
global = true
}
object Zone “director-global” {
global = true
}`
`root@infra-1:~# cat /data/icinga/data/etc/icinga2/zones.conf
/*
Generated by puppet
on 2025-12-04 14:37:09 +0000
*/
object Endpoint “icinga-master-1.example.com” {
host = “x.x.x.x”
}
object Endpoint “icinga-master-2.example.com” {
host = “y.y.y.y”
}
object Endpoint “infra-1.zone2.example.com” {
}
object Endpoint “infra-2.zone2.example.com” {
}
object Zone “master” {
endpoints = [ “icinga-master-1.example.com”, “icinga-master-2.example.com” ]
}
object Zone “zone2.example.com” {
endpoints = [ “infra-1.zone2.example.com”, “infra-2.zone2.example.com” ]
parent = “master”
}
object Zone “global-templates” {
global = true
}
object Zone “global-commands” {
global = true
}
object Zone “director-global” {
global = true
}
root@infra-1:~# cat /data/icinga/data/etc/icinga2/constants.conf
/**
This file defines global constants which can be used in
the other configuration files.
*/
/* The directory which contains the plugins from the Monitoring Plugins project. */
const PluginDir = “/usr/lib/nagios/plugins”
/* The directory which contains the Manubulon plugins.
Check the documentation, chapter “SNMP Manubulon Plugin Check Commands”, for details.
*/
const ManubulonPluginDir = “/usr/lib/nagios/plugins”
/* The directory which you use to store additional plugins which ITL provides user contributed command definitions for.
Check the documentation, chapter “Plugins Contribution”, for details.
*/
const PluginContribDir = “/usr/lib/nagios/plugins”
/* Our local instance name. By default this is the server’s hostname as returned by hostname --fqdn.
This should be the common name from the API certificate.
*/
const NodeName = “infra-1.zone2.example.com”
/* Our local zone name. */
const ZoneName = “infra-1.zone2.example.com”
/* Secret key for remote node tickets */
const TicketSalt = “$some_tickets”
icinga@infra-1:/opt$ cat /var/lib/icinga2/api/zones/zone2.example.com/_etc/hosts_zone2.conf
/*
Generated by puppet
on 2025-12-05 15:11:53 +0000
*/
object Host “infra-1.zone2.example.com” {
import “Prod Template”
display_name = “infra-1.zone2.example.com”
address = “x.x.x.x”
groups = [ “zone2.example.com” ]
}
object Host “infra-2.zone2.example.com” {
import “Prod Template”
display_name = “infra-2.zone2.example.com”
address = “x.x.x.x”
groups = [ “zone2.example.com” ]
}
object Host “master.example.com” {
import “Prod Template”
display_name = “master.example.com”
address = “x.x.x.x”
groups = [ “zone2.example.com” ]
} // This is not an icinga master, it's a node in the zone that is called master (our product to monitor)
icinga@icinga-master-1:/opt$ cat /etc/icinga2/zones.d/zone2.example.com/hosts_zone2.conf
/*
Generated by puppet
on 2025-12-05 14:39:51 +0000
*/
object Host “infra-1.zone2.example.com” {
import “Prod Template”
display_name = “infra-1.zone2.example.com”
address = “x.x.x.x”
groups = [ “zone2.example.com” ]
}
object Host “infra-2.zone2.example.com” {
import “Prod Template”
display_name = “infra-2.zone2.example.com”
address = “x.x.x.x”
groups = [ “zone2.example.com” ]
}
object Host “master.zone2.example.com” {
import “Prod Template”
display_name = “master.zone2.example.com”
address = “x.x.x.x”
groups = [ “zone2.example.com” ]
}`