improve: Improve progress bar by hh-space-invader · Pull Request #431 · qdrant/fastembed

hh-space-invader · 2024-12-30T22:54:57Z

Problem:

If the user tried to download the model, then stopped the process, then tried to download again, it does not show progress bar.

Suggestion:

Create a metadata file after the download is complete, and load before download.

All Submissions:

Have you followed the guidelines in our Contributing document?
Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

Does your submission pass the existing tests?
Have you added tests for your feature?
Have you installed pre-commit with pip3 install pre-commit and set up hooks with pre-commit install?

New models submission:

Have you added an explanation of why it's important to include this model?
Have you added tests for the new model? Were canonical values for tests computed via the original model?
Have you added the code snippet for how canonical values were computed?
Have you successfully ran tests with your changes locally?

fastembed/common/model_management.py

joein · 2025-01-06T12:49:20Z

fastembed/common/model_management.py

+        if not os.path.exists(os.path.join(result, model_file)):
+            raise FileNotFoundError("Couldn't download model from huggingface")


what if any of the other required files is missing?

As far as I know, the blobs are downloaded first, then its extracted to .onnx and the required files. So if the .onnx exists, then the blobs extracted everything correctly. U can check that by deleting the snapshot folder and try to run the model. The snapshot folder will be extracted again and the .onnx (and other files) will exist.

I think this check is redundant, let's rely on hf here
also we have models without onnx files like Qdrant/bm25 and then it just checks existence of a directory

and we can also remove model_file variable then

fastembed/common/model_management.py

joein · 2025-01-06T12:58:28Z

fastembed/common/model_management.py


-        if is_cached:
-            disable_progress_bars()
+        if snapshot_dir.exists() and metadata_file.exists():


if version on hf is different from the one we have locally, then we will hide the progress bar and then we will silently download the updated files
I think we could make a corresponding call to HfApi and check revision and commit hash

Can u explain it further ? Cuz as far as Andrey's said, he don't want to make a call to HFapi as it requires network. Do u mean that we can call it only while downloading on the first time and add revision to metadata ? And only call it when there;s network ?

I've added a _get_file_hash function to compute hash for each file and then later on checked with _verify_files_from_metadata if the version changed

I think it's safe to make this call if local_files_only != True

iirc in snapshot_download they just pull the whole repo info and compare revision's commit hahs

fix: Fix ci

joein · 2025-01-16T18:01:41Z

fastembed/common/model_management.py

+        if not os.path.exists(os.path.join(result, model_file)):
+            raise FileNotFoundError("Couldn't download model from huggingface")


I think this check is redundant, let's rely on hf here
also we have models without onnx files like Qdrant/bm25 and then it just checks existence of a directory

joein · 2025-01-16T18:02:48Z

fastembed/common/model_management.py

+        if not os.path.exists(os.path.join(result, model_file)):
+            raise FileNotFoundError("Couldn't download model from huggingface")


and we can also remove model_file variable then

joein · 2025-01-16T18:12:51Z

fastembed/common/model_management.py

+                    try:  # there's network at least first time
+                        url = hf_hub_url(hf_source_repo, file_path.name)
+                        hf_metadata = get_hf_file_metadata(url)
+                        metadata[str(file_path.relative_to(model_dir))] = {
+                            "size": hf_metadata.size,
+                            "commit_hash": hf_metadata.commit_hash,
+                        }


can we retrieve information about the whole repository instead of retrieving files one by one?

joein · 2025-01-16T18:14:28Z

fastembed/common/model_management.py


-        if is_cached:
-            disable_progress_bars()
+        if snapshot_dir.exists() and metadata_file.exists():


iirc in snapshot_download they just pull the whole repo info and compare revision's commit hahs

joein · 2025-01-27T21:36:52Z

closing in favour of #440

improve: Improve progress bar

10cbbca

hh-space-invader requested review from I8dNLo and joein December 30, 2024 22:54

fix: Fix error downloading when internet connection down

156d6c5

joein requested changes Jan 6, 2025

View reviewed changes

new: Added file hash computation to track new versions

ddd765f

hh-space-invader requested a review from joein January 8, 2025 06:09

hh-space-invader added 2 commits January 8, 2025 09:33

refactor: Removed redundant hash check

6992790

fix: Fix ci

new: Verify using hf_api

c14bd0c

joein requested changes Jan 16, 2025

View reviewed changes

joein closed this Jan 27, 2025

hh-space-invader deleted the improve-progress-bar branch January 30, 2025 03:06

		if not os.path.exists(os.path.join(result, model_file)):
		raise FileNotFoundError("Couldn't download model from huggingface")

Conversation

hh-space-invader commented Dec 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

All Submissions:

New Feature Submissions:

New models submission:

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hh-space-invader Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joein commented Jan 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hh-space-invader commented Dec 30, 2024 •

edited

Loading

hh-space-invader Jan 8, 2025 •

edited

Loading