`gce-pd-driver` memory limits

We had rough last couple of weeks, because our forked pdcsi would OOM during mount, and logs would suggest FS got corrupted:

```
[42460.184734] BTRFS error (device sdb: state E): block=185898026516480 write time tree block corruption detected
[42462.380353] BTRFS: error (device sdb: state E) in btrfs_commit_transaction:2526: errno=-30 Readonly filesystem (Error while writing out transaction)
[42462.393806] BTRFS warning (device sdb: state E): Skipping commit of aborted transaction.
[42462.393810] BTRFS error (device sdb: state EA): Transaction aborted (error -30)
[42462.401262] BTRFS: error (device sdb: state EA) in cleanup_transaction:2020: errno=-30 Readonly filesystem
[42462.645482] BTRFS error (device sdb: state EA): open_ctree failed
```

Turns out this was all a red herring: when `pdcsi` container has enough memory, it will successfully mount the same disk. However, I was not able to reproduce or measure how much memory it actually consumed (it's mounting a disk that just showed this issue):

```
# systemd-run  --wait --unit=mount-monitor mount /dev/sdb /mnt/
Running as unit: mount-monitor.service; invocation ID: e904b1a1721f4e14a1980e603f130634
Finished with result: success
Main processes terminated with: code=exited/status=0
Service runtime: 52.646s
CPU time consumed: 12.104s
Memory peak: 256.0K
Memory swap peak: 0B
```

The disk is 7TiB.

The current mem limit for `pdcsi` container is 450MiB. Can we consider bumping or removing the pdcsi memory limits altogether? We will be running `gce-pd-driver` without mem limits and report back after a few weeks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`gce-pd-driver` memory limits #2275

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

gce-pd-driver memory limits #2275

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`gce-pd-driver` memory limits #2275