Conversation
| if not os.path.exists(os.path.join(result, model_file)): | ||
| raise FileNotFoundError("Couldn't download model from huggingface") |
There was a problem hiding this comment.
what if any of the other required files is missing?
There was a problem hiding this comment.
As far as I know, the blobs are downloaded first, then its extracted to .onnx and the required files. So if the .onnx exists, then the blobs extracted everything correctly. U can check that by deleting the snapshot folder and try to run the model. The snapshot folder will be extracted again and the .onnx (and other files) will exist.
There was a problem hiding this comment.
I think this check is redundant, let's rely on hf here
also we have models without onnx files like Qdrant/bm25 and then it just checks existence of a directory
There was a problem hiding this comment.
and we can also remove model_file variable then
|
|
||
| if is_cached: | ||
| disable_progress_bars() | ||
| if snapshot_dir.exists() and metadata_file.exists(): |
There was a problem hiding this comment.
if version on hf is different from the one we have locally, then we will hide the progress bar and then we will silently download the updated files
I think we could make a corresponding call to HfApi and check revision and commit hash
There was a problem hiding this comment.
Can u explain it further ? Cuz as far as Andrey's said, he don't want to make a call to HFapi as it requires network. Do u mean that we can call it only while downloading on the first time and add revision to metadata ? And only call it when there;s network ?
There was a problem hiding this comment.
I've added a _get_file_hash function to compute hash for each file and then later on checked with _verify_files_from_metadata if the version changed
There was a problem hiding this comment.
I think it's safe to make this call if local_files_only != True
There was a problem hiding this comment.
iirc in snapshot_download they just pull the whole repo info and compare revision's commit hahs
| if not os.path.exists(os.path.join(result, model_file)): | ||
| raise FileNotFoundError("Couldn't download model from huggingface") |
There was a problem hiding this comment.
I think this check is redundant, let's rely on hf here
also we have models without onnx files like Qdrant/bm25 and then it just checks existence of a directory
| if not os.path.exists(os.path.join(result, model_file)): | ||
| raise FileNotFoundError("Couldn't download model from huggingface") |
There was a problem hiding this comment.
and we can also remove model_file variable then
| try: # there's network at least first time | ||
| url = hf_hub_url(hf_source_repo, file_path.name) | ||
| hf_metadata = get_hf_file_metadata(url) | ||
| metadata[str(file_path.relative_to(model_dir))] = { | ||
| "size": hf_metadata.size, | ||
| "commit_hash": hf_metadata.commit_hash, | ||
| } |
There was a problem hiding this comment.
can we retrieve information about the whole repository instead of retrieving files one by one?
|
|
||
| if is_cached: | ||
| disable_progress_bars() | ||
| if snapshot_dir.exists() and metadata_file.exists(): |
There was a problem hiding this comment.
iirc in snapshot_download they just pull the whole repo info and compare revision's commit hahs
|
closing in favour of #440 |
Problem:
Suggestion:
All Submissions:
New Feature Submissions:
pre-commitwithpip3 install pre-commitand set up hooks withpre-commit install?New models submission: