2025-01-03 11:06:50.367 PST
[2m2025-01-03T19:06:50.366843Z[0m [32m INFO[0m [2mtext_generation_launcher[0m[2m:[0m Download: [13/30] -- ETA: 0:06:42.769236
- Somehow instead of /data some other mount /etc/hosts is getting filled and eventually runs out of disk space.
root@llm-689555d8bf-62gjd:/etc# df
\Filesystem 1K-blocks Used Available Use% Mounted on
overlay 98831908 75370476 23445048 77% /
tmpfs 65536 0 65536 0% /dev
/dev/nvme0n2 153707984 28 153691572 1% /data
tmpfs 62914560 12 62914548 1% /dev/shm
/dev/nvme0n1p1 98831908 75370476 23445048 77% /etc/hosts
tmpfs 62914560 12 62914548 1% /run/secrets/kubernetes.io/serviceaccount
tmpfs 49441048 0 49441048 0% /proc/acpi
tmpfs 49441048 0 49441048 0% /proc/scsi
tmpfs 49441048 0 49441048 0% /sys/firmware
root@llm-fb5d99cb-569b7:/usr/src# df
Filesystem 1K-blocks Used Available Use% Mounted on
overlay 98831908 25463724 73351800 26% /
tmpfs 65536 0 65536 0% /dev
/dev/nvme0n2 153707984 137809580 15882020 90% /data
tmpfs 62914560 48980 62865580 1% /dev/shm
/dev/nvme0n1p1 98831908 25463724 73351800 26% /etc/hosts
tmpfs 62914560 12 62914548 1% /run/secrets/kubernetes.io/serviceaccount
tmpfs 49439752 0 49439752 0% /proc/acpi
tmpfs 49439752 0 49439752 0% /proc/scsi
tmpfs 49439752 0 49439752 0% /sys/firmware
NOTE: There might be other sample also impacted with the above change. Since we don't have any automated gates for the validation.
ghcr.io/huggingface/text-generation-inference:2.0.4result in to successful run. Below is the disk usage for the same.NOTE: There might be other sample also impacted with the above change. Since we don't have any automated gates for the validation.