Replies: 0 comments 5 replies
-
I've converted this to a discussion because it feels more like a support question than a bug report. |
Beta Was this translation helpful? Give feedback.
-
@rshad is this new with v16.19.0, or can you reproduce this under v16.18.1 as well? |
Beta Was this translation helpful? Give feedback.
-
Hi @tilgovi! Actually, we never tested it before with other configs of AWS EC2 instances than the one we got working. But this would be an excellent approach to test. I will be back with more details if we could revert back to version v16.18.1 and see what happens. |
Beta Was this translation helpful? Give feedback.
-
Hi @tilgovi, Actually, we've just tested it with v16.18.1 and also v18 and it still fails! Docker images: |
Beta Was this translation helpful? Give feedback.
-
@rshad did you get to the bottom of this or have a workaround for the same ? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Version
v16.19.0
Platform
amzn2.x86_64
Subsystem
No response
What steps will reproduce the bug?
When running a job as a pipeline in Gitlab Runner's K8s pod, the job gets completed successfully only when running on a small instance like
m5*.large
which offers 2vCPUs
and8GB
of RAM. We set a limit for thebuild
,helper
, andservices
containers mentioned below. Still, the job fails with an Out Of Memory (OOM) error, getting the processnode
killed bycgroup
when running on an instance way more powerful likem5d*.2xlarge
for example which offers 8vCPUs
and32GB
ofRAM
.Note that we tried to dedicate high resources to the containers, especially the
build
one in which the node process is a child process of this and nothing changed when running on powerful instances; the node process still got killed because of OOM, each time we give it more memory, the node process consumed higher memory and so on.Also, regarding the CPU usage, in powerful instances, the more vCPUs we gave it, the more is consumed and we noticed that it has CPU Throtelling at ~100% almost all the time, however, in the small instances like
m5*.large
, the CPU throttling didn't pass the 3%.Note that we specified a maximum of memory that be used by the
node
process but it looks like it does not take any effect. We tried to set it to 1GB, 1.5GB and 3GB.resources request/limits configuration
Resource consumptions of a successful job running on m5d.large
Resource consumptions of a failing job running on m5d.2xlarge
How often does it reproduce? Is there a required condition?
Always
What is the expected behavior?
No response
What do you see instead?
Logs of the host where the job runs
Beta Was this translation helpful? Give feedback.
All reactions