How to debug and fix disk I/O problems #4082

Nuru · 2024-07-09T18:09:43Z

Nuru
Jul 9, 2024

This is a follow-up to #4072

As far as I am aware, I have moved all my Pods to using tmpfs for file access, but nevertheless, when I launch 90 Pods (all with the same Docker image) at once, the system locks up because of excessive disk I/O on the EBS volume. How do I debug this?

On AL2 I can use tools like dstat, atop, and iotop to look at what is going on, but these tools are not installed or easily installed (dstat and iotop are both Python programs) on Bottlerocket, and why I tried running them in a privileged container, I still didn't get host stats, as far as I could tell. Then again, this is at the outer limits of my training and experience, so while you don't need to ELi5, I would appreciate some clear directions and/or advice as to how to figure out what is sucking up all the disk I/O and how to move it to tmpfs.

In a related issue, when running EKS on Bottlerocket, how do I get it to reserve memory I allocate to tmpfs for this purpose, so that Kubernetes does not try to allocate it to Pods?

dpavlov-smartling · 2025-03-20T16:11:07Z

dpavlov-smartling
Mar 20, 2025

I want to bump this request. While different monitoring solution like Datadog, Prometheus allow you to monitor some of processes on the ECS or Kubernetes instances. You still lack some information that provides atop or similar tool.
Atop keeps log of system activity, so we can go back and see what process caused OOM, etc.
Unfortunately, atop designed to be restarted each day and requires systemd or other init system to be working in the background. Because of this attempt to use it as host-container can be tricky.
Does anybody see any possible alternative for this?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to debug and fix disk I/O problems #4082

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to debug and fix disk I/O problems #4082

Uh oh!

Nuru Jul 9, 2024

Replies: 1 comment

Uh oh!

dpavlov-smartling Mar 20, 2025

Nuru
Jul 9, 2024

dpavlov-smartling
Mar 20, 2025