Skip to content

gce-pd-driver memory limits #2275

@motiejus

Description

@motiejus

We had rough last couple of weeks, because our forked pdcsi would OOM during mount, and logs would suggest FS got corrupted:

[42460.184734] BTRFS error (device sdb: state E): block=185898026516480 write time tree block corruption detected
[42462.380353] BTRFS: error (device sdb: state E) in btrfs_commit_transaction:2526: errno=-30 Readonly filesystem (Error while writing out transaction)
[42462.393806] BTRFS warning (device sdb: state E): Skipping commit of aborted transaction.
[42462.393810] BTRFS error (device sdb: state EA): Transaction aborted (error -30)
[42462.401262] BTRFS: error (device sdb: state EA) in cleanup_transaction:2020: errno=-30 Readonly filesystem
[42462.645482] BTRFS error (device sdb: state EA): open_ctree failed

Turns out this was all a red herring: when pdcsi container has enough memory, it will successfully mount the same disk. However, I was not able to reproduce or measure how much memory it actually consumed (it's mounting a disk that just showed this issue):

# systemd-run  --wait --unit=mount-monitor mount /dev/sdb /mnt/
Running as unit: mount-monitor.service; invocation ID: e904b1a1721f4e14a1980e603f130634
Finished with result: success
Main processes terminated with: code=exited/status=0
Service runtime: 52.646s
CPU time consumed: 12.104s
Memory peak: 256.0K
Memory swap peak: 0B

The disk is 7TiB.

The current mem limit for pdcsi container is 450MiB. Can we consider bumping or removing the pdcsi memory limits altogether? We will be running gce-pd-driver without mem limits and report back after a few weeks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions