-
Notifications
You must be signed in to change notification settings - Fork 170
Open
Description
We had rough last couple of weeks, because our forked pdcsi would OOM during mount, and logs would suggest FS got corrupted:
[42460.184734] BTRFS error (device sdb: state E): block=185898026516480 write time tree block corruption detected
[42462.380353] BTRFS: error (device sdb: state E) in btrfs_commit_transaction:2526: errno=-30 Readonly filesystem (Error while writing out transaction)
[42462.393806] BTRFS warning (device sdb: state E): Skipping commit of aborted transaction.
[42462.393810] BTRFS error (device sdb: state EA): Transaction aborted (error -30)
[42462.401262] BTRFS: error (device sdb: state EA) in cleanup_transaction:2020: errno=-30 Readonly filesystem
[42462.645482] BTRFS error (device sdb: state EA): open_ctree failed
Turns out this was all a red herring: when pdcsi container has enough memory, it will successfully mount the same disk. However, I was not able to reproduce or measure how much memory it actually consumed (it's mounting a disk that just showed this issue):
# systemd-run --wait --unit=mount-monitor mount /dev/sdb /mnt/
Running as unit: mount-monitor.service; invocation ID: e904b1a1721f4e14a1980e603f130634
Finished with result: success
Main processes terminated with: code=exited/status=0
Service runtime: 52.646s
CPU time consumed: 12.104s
Memory peak: 256.0K
Memory swap peak: 0B
The disk is 7TiB.
The current mem limit for pdcsi container is 450MiB. Can we consider bumping or removing the pdcsi memory limits altogether? We will be running gce-pd-driver without mem limits and report back after a few weeks.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels