Skip to content

Conversation

@joein
Copy link
Member

@joein joein commented Nov 11, 2024

We've recently observed consistent failures in our gpu branch CI, after conducting several experiments, I found out, that we're not properly cleaning up models cache.

If a model was downloaded from GCP, then everything works fine.
However, if a model was downloaded from HF hub, then _model_dir which we're passing is pointing to snapshots dir, rather than to the root model's dir, thus, instead of actual data, we're only deleting symlinks to the files, which is stored in blobs. It led to running out of disk space in our gpu CI.

@joein
Copy link
Member Author

joein commented Nov 11, 2024

#399

@joein joein marked this pull request as draft November 11, 2024 18:46
@joein joein marked this pull request as ready for review November 12, 2024 14:02
@joein joein requested review from generall and hh-space-invader and removed request for generall November 12, 2024 14:42
@joein joein merged commit faed7f1 into main Nov 12, 2024
@joein joein deleted the fix-test-cleanup branch November 12, 2024 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants