-
Notifications
You must be signed in to change notification settings - Fork 4
Description
We are trying to deploy the tool on a new Kubernetes cluster and are facing issues with the worker: it keeps using up all memory until it reaches the defined memory limit (then getting OOM-killed), even with extremely high limits, such as 50GiB. This happens directly after starting up, without any load on the worker.
Jumping into the container before it get's killed, shows that the squid cache seems to be responsible for this behavior:
top - 08:43:45 up 1 day, 18:27, 0 users, load average: 3.02, 0.89, 0.43
Tasks: 22 total, 2 running, 20 sleeping, 0 stopped, 0 zombie
%Cpu(s): 11.8 us, 3.0 sy, 0.0 ni, 84.8 id, 0.1 wa, 0.2 hi, 0.0 si, 0.0 st
MiB Mem : 128795.3 total, 11098.1 free, 107866.0 used, 9831.2 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 19747.4 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11 root 20 0 10.1g 1.0g 28928 S 363.3 0.8 1:33.64 java
18 root 20 0 568.1g 31.7g 12544 R 99.3 25.2 0:22.17 squid
1 root 20 0 2616 1280 1280 S 0.0 0.0 0:00.01 sh
7 root 20 0 25772 20968 8960 S 0.0 0.0 0:00.19 supervisord
9 root 20 0 4212 3072 2816 S 0.0 0.0 0:00.00 bash
10 root 20 0 2616 1536 1536 S 0.0 0.0 0:00.00 apachectl
12 postgres 20 0 216260 29184 27136 S 0.0 0.0 0:00.04 postgres
14 root 20 0 100796 20992 8448 S 0.0 0.0 0:00.14 python3
25 root 20 0 11268 8392 6856 S 0.0 0.0 0:00.03 apache2
39 www-data 20 0 2002356 4632 2816 S 0.0 0.0 0:00.00 apache2
40 www-data 20 0 2002356 4632 2816 S 0.0 0.0 0:00.00 apache2
95 postgres 20 0 216400 5812 3840 S 0.0 0.0 0:00.00 postgres
98 postgres 20 0 216392 6068 4096 S 0.0 0.0 0:00.00 postgres
102 postgres 20 0 216392 10164 8192 S 0.0 0.0 0:00.00 postgres
103 postgres 20 0 217844 9396 7168 S 0.0 0.0 0:00.00 postgres
104 postgres 20 0 217852 8372 6144 S 0.0 0.0 0:00.00 postgres
229 root 20 0 2616 1280 1280 S 0.0 0.0 0:00.00 sh
235 root 20 0 2616 256 256 S 0.0 0.0 0:00.00 sh
236 root 20 0 2644 1536 1536 S 0.0 0.0 0:00.00 script
237 root 20 0 2616 1280 1280 S 0.0 0.0 0:00.00 sh
238 root 20 0 4248 3328 2816 S 0.0 0.0 0:00.00 bash
241 root 20 0 6112 3072 2560 R 0.0 0.0 0:00.00 top
Tested with versions 2.2.6, 2.2.9 and 2.3.3.
Any idea how to address that? Is there a possible fix or workaround? Maybe updating squid to a newer version would help (the image comes with v4.10 with the latest version being v7.4)?
Note: the worker version 2.2.6 used to be running without this issue on an old cluster. Unfortunately, I can't tell what the exact differences were on the cluster / node level. The old one may not have used cgroups v2, for example, but I can't tell for sure.