-
-
Notifications
You must be signed in to change notification settings - Fork 786
Description
Bug Report
Description
We are observing repeatable Linux eBPF verifier internal errors on Talos when running Cilium, resulting in the node becoming unstable and temporarily losing networking connectivity.
This is not a normal eBPF program rejection. The kernel verifier itself triggers an internal invariant violation (REG INVARIANTS VIOLATION) in reg_bounds_sanity_check, emitting a WARN. After this happens, Cilium networking becomes unreliable and the node may appear disconnected from cluster management.
This issue has been reproduced multiple times and persists across Talos patch releases.
I can see these error messages across all nodes. Eventually one node was down.
Logs
Kernel log excerpt (reproduced multiple times):
04/02/2026 05:22:24
------------[ cut here ]------------
04/02/2026 05:22:24
verifier bug: REG INVARIANTS VIOLATION (false_reg1): range bounds violation u64=[0xe01, 0xe00] s64=[0xe01, 0xe00] u32=[0xe01, 0xe00] s32=[0xe01, 0xe00] var_off=(0xe00, 0x0)
04/02/2026 05:22:24
WARNING: CPU: 6 PID: 4123 at kernel/bpf/verifier.c:2731 reg_bounds_sanity_check+0x19d/0x210
04/02/2026 05:22:24
Modules linked in: intel_rapl_msr intel_rapl_common intel_pmc_core_pltdrv ahci intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry e1000e intel_vsec libahci nvme wdat_wdt i2c_i801 watchdog i2c_smbus
04/02/2026 05:22:24
CPU: 6 UID: 0 PID: 4123 Comm: cilium-agent Tainted: G S 6.18.5-talos #1 NONE
04/02/2026 05:22:24
Tainted: [S]=CPU_OUT_OF_SPEC
04/02/2026 05:22:24
Hardware name: FUJITSU /D3401-H2, BIOS V5.0.0.12 R1.27.0 for D3401-H2x 05/28/2020
04/02/2026 05:22:24
RIP: 0010:reg_bounds_sanity_check+0x19d/0x210
04/02/2026 05:22:24
Code: ff 73 18 41 50 57 48 c7 c7 50 34 22 95 56 4c 89 d6 50 ff 73 30 4c 8b 4b 28 4c 8b 43 40 48 89 55 d8 4c 89 55 e0 e8 b3 7e e3 ff <0f> 0b 48 8b 55 d8 4c 8b 55 e0 48 83 c4 38 e9 ec fe ff ff 48 8b 7b
04/02/2026 05:22:24
RSP: 0018:ffffcc7dc7ed7650 EFLAGS: 00010246
04/02/2026 05:22:24
RAX: 0000000000000000 RBX: ffff895e28b59bf0 RCX: 0000000000000000
04/02/2026 05:22:24
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
04/02/2026 05:22:24
RBP: ffffcc7dc7ed76b8 R08: 0000000000000000 R09: 0000000000000000
04/02/2026 05:22:24
R10: 0000000000000000 R11: 0000000000000000 R12: ffff895daaec8000
04/02/2026 05:22:24
R13: 0000000000000010 R14: ffff895daaece300 R15: ffff895e28b59bf0
04/02/2026 05:22:24
FS: 000000c0020b0090(0000) GS:ffff896ad732a000(0000) knlGS:0000000000000000
04/02/2026 05:22:24
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
04/02/2026 05:22:24
CR2: 000000c002f61000 CR3: 000000010c9f6001 CR4: 00000000003726f0
04/02/2026 05:22:24
Call Trace:
04/02/2026 05:22:24
<TASK>
04/02/2026 05:22:24
? reg_bounds_sync+0x123/0x1b0
This occurs while cilium-agent is running and loading or updating BPF programs.
Environment
- Talos version: 1.12.1 (also reproduced on 1.12.2)
- Kernel: 6.18.5-talos (also reproduced on 6.18.2-talos)
- Kubernetes version: 1.35.0
- CNI: Cilium
- Chart version: * (latest chart, v1.19.0)
- Node role: Worker node
- Platform / Hardware:
- Vendor: FUJITSU
- Model: D3401-H2
- BIOS: V5.0.0.12 (05/28/2020)
- CPU taint:
CPU_OUT_OF_SPEC
Notes
- Issue is deterministic on this hardware
- Reproduced across multiple Talos patch releases
- Appears to be a kernel eBPF verifier regression rather than a Cilium configuration error
- Minor kernel updates within the 6.18.x line do not resolve the issue