Move model storage to the /mnt directory on both the host and the Kin… #792

liavweiss · 2025-12-09T14:20:06Z

Move model storage to /mnt directory to prevent disk space issues

Summary

This PR addresses disk space issues in CI/CD workflows by moving model downloads from the root filesystem (~14GB available) to the /mnt directory (~75GB available). This prevents "no space left on device" errors when downloading large models during CI runs.

Problem

GitHub Actions runners were experiencing disk space exhaustion when downloading large models. The root filesystem (/) has only ~14GB available, which is insufficient for model downloads that can reach several GB. The previous workaround of deleting toolchains (~25GB) was:

Not sufficient for large model sets
Not stable (depended on toolchain versions)
Not applicable to Kind clusters (models stored inside the cluster)

Solution

This PR implements a comprehensive solution that works for both host-level workflows and Kind cluster-based tests:

Host-level workflows (test-and-build.yml, integration-test-docker.yml):
- Creates /mnt/models directory
- Moves existing models/ directory to /mnt/models if present
- Creates a symlink from models/ to /mnt/models/ for backward compatibility
- All model downloads now use the larger /mnt disk (~75GB)
Kind cluster workflows (integration-test-k8s.yml):
- Mounts host /mnt into Kind nodes (control-plane and worker)
- Patches local-path-provisioner ConfigMap to use /mnt/local-path-provisioner instead of /tmp
- All PVCs (including model storage) now use the larger disk
- Removed the temporary "Free up disk space" step (no longer needed)

Changes

Files Modified

.github/workflows/test-and-build.yml (+38 lines)
- Added "Setup model storage on /mnt" step with symlink creation
.github/workflows/integration-test-docker.yml (+45 lines, -11 lines)
- Replaced "Free up disk space" step with "Setup model storage on /mnt"
.github/workflows/integration-test-k8s.yml (-11 lines)
- Removed "Free up disk space" step (no longer needed)
e2e/pkg/cluster/kind.go (+49 lines, -2 lines)
- Added Kind config with /mnt mount for control-plane and worker nodes
- Added logic to patch local-path-config ConfigMap to use /mnt/local-path-provisioner
- Restarts local-path-provisioner deployment after patching

Technical Details

Host-Level Implementation

The symlink approach ensures backward compatibility - existing code continues to work without changes:

models/ -> /mnt/models/

Kind Cluster Implementation

Mount Configuration: Kind nodes mount host /mnt at container path /mnt
Storage Provisioner: local-path-provisioner is patched to use /mnt/local-path-provisioner as the base path
PVC Storage: All PersistentVolumeClaims created in the cluster now use the larger disk

Testing

Validation Performed

✅ Go modules tidy check passed
✅ Pre-commit hooks passed (except local OpenSSL issue - CI will handle)
✅ Code formatting verified (go fmt, YAML syntax)
✅ DCO sign-off verified on all commits
✅ Git status clean
✅ integration-test-k8s.yml - passed on my fork
✅ test-and-build.yml - passed on my fork
✅ docker-publish.yml - passed on my fork

Benefits

5x more disk space: ~75GB available vs ~14GB on root
Stable solution: No longer depends on toolchain versions
Works for all scenarios: Both host-level and Kubernetes-based tests
Backward compatible: Existing code works without changes (symlink)
Cleaner workflows: Removed temporary workarounds

Related Issues

Addresses disk space issues mentioned in PR #623 (follow-up improvement requested by @rootfs)

…d cluster to prevent disk space issues Signed-off-by: Liav Weiss <[email protected]>

netlify · 2025-12-09T14:20:11Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`68238e1`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/69383019242cfe0007200989
😎 Deploy Preview	https://deploy-preview-792--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2025-12-09T14:37:21Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `Root Directory`

Owners: @rootfs, @Xunzhuo
Files changed:

.github/workflows/integration-test-docker.yml
.github/workflows/integration-test-k8s.yml
.github/workflows/test-and-build.yml

📁 `e2e`

Owners: @Xunzhuo
Files changed:

e2e/pkg/cluster/kind.go

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

Xunzhuo

cool

yuluo-yx

/lgtm

rootfs · 2025-12-09T16:50:38Z

this error looks related to the change:

=== RUN   TestAutoInitializeUnifiedClassifier
    classifier_test.go:2097: AutoInitializeUnifiedClassifier() failed: model validation failed: no valid models found (neither LoRA nor legacy) (models directory should exist at ../../../../models)
--- FAIL: TestAutoInitializeUnifiedClassifier (0.00s)
=== RUN   TestUnifiedClassifier_Initialize
=== RUN   TestUnifiedClassifier_Initialize/Already_initialized
=== RUN   TestUnifiedClassifier_Initialize/Initialization_attempt
Error: ModernBERT model path does not exist: ./test_models/modernbert

Move model storage to the /mnt directory on both the host and the Kin…

68238e1

…d cluster to prevent disk space issues Signed-off-by: Liav Weiss <[email protected]>

liavweiss requested review from Xunzhuo and rootfs as code owners December 9, 2025 14:20

github-actions bot assigned rootfs and Xunzhuo Dec 9, 2025

Xunzhuo approved these changes Dec 9, 2025

View reviewed changes

yuluo-yx approved these changes Dec 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move model storage to the /mnt directory on both the host and the Kin… #792

Move model storage to the /mnt directory on both the host and the Kin… #792

liavweiss commented Dec 9, 2025

Uh oh!

netlify bot commented Dec 9, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 9, 2025

Uh oh!

Xunzhuo left a comment

Uh oh!

yuluo-yx left a comment

Uh oh!

rootfs commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Move model storage to the /mnt directory on both the host and the Kin… #792

Are you sure you want to change the base?

Move model storage to the /mnt directory on both the host and the Kin… #792

Conversation

liavweiss commented Dec 9, 2025

Move model storage to /mnt directory to prevent disk space issues

Summary

Problem

Solution

Changes

Files Modified

Technical Details

Host-Level Implementation

Kind Cluster Implementation

Testing

Validation Performed

Benefits

Related Issues

Uh oh!

netlify bot commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions bot commented Dec 9, 2025

👥 vLLM Semantic Team Notification

📁 Root Directory

📁 e2e

🎉 Thanks for your contributions!

Uh oh!

Xunzhuo left a comment

Choose a reason for hiding this comment

Uh oh!

yuluo-yx left a comment

Choose a reason for hiding this comment

Uh oh!

rootfs commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

netlify bot commented Dec 9, 2025 •

edited

Loading

📁 `Root Directory`

📁 `e2e`