Skip to content

Commit c06efc3

Browse files
Wauplinsgugger
andauthored
Document .no_exist folder (#1308)
* Document .no_exist folder * typo * as subheader * Update docs/source/how-to-cache.mdx Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>
1 parent bdb9d06 commit c06efc3

File tree

4 files changed

+67
-4
lines changed

4 files changed

+67
-4
lines changed

docs/source/how-to-cache.mdx

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,46 @@ That `README.md` file is actually a symlink linking to the blob that has the has
8787
By creating the skeleton this way we open the mechanism to file sharing: if the same file was fetched in
8888
revision `bbbbbb`, it would have the same hash and the file would not need to be re-downloaded.
8989

90+
### .no_exist (advanced)
91+
92+
In addition to the `blobs`, `refs` and `snapshots` folders, you might also find a `.no_exist` folder
93+
in your cache. This folder keeps track of files that you've tried to download once but don't exist
94+
on the Hub. Its structure is the same as the `snapshots` folder with 1 subfolder per known revision:
95+
96+
```
97+
<CACHE_DIR>/<REPO_NAME>/.no_exist/aaaaaa/config_that_does_not_exist.json
98+
```
99+
100+
Unlike the `snapshots` folder, files are simple empty files (no symlinks). In this example,
101+
the file `"config_that_does_not_exist.json"` does not exist on the Hub for the revision `"aaaaaa"`.
102+
As it only stores empty files, this folder is neglectable is term of disk usage.
103+
104+
So now you might wonder, why is this information even relevant?
105+
In some cases, a framework tries to load optional files for a model. Saving the non-existence
106+
of optional files makes it faster to load a model as it saves 1 HTTP call per possible optional file.
107+
This is for example the case in `transformers` where each tokenizer can support additional files.
108+
The first time you load the tokenizer on your machine, it will cache which optional files exists (and
109+
which doesn't) to make the loading time faster for the next initializations.
110+
111+
To test if a file is cached locally (without making any HTTP request), you can use the [`try_to_load_from_cache`]
112+
helper. It will either return the filepath (if exists and cached), the object `_CACHED_NO_EXIST` (if non-existence
113+
is cached) or `None` (if we don't know).
114+
115+
```python
116+
from huggingface_hub import try_to_load_from_cache, _CACHED_NO_EXIST
117+
118+
filepath = try_to_load_from_cache()
119+
if isinstance(filepath, str):
120+
# file exists and is cached
121+
...
122+
elif filepath is _CACHED_NO_EXIST:
123+
# non-existence of file is cached
124+
...
125+
else:
126+
# file is not cached
127+
...
128+
```
129+
90130
### In practice
91131

92132
In practice, your cache should look like the following tree:

docs/source/package_reference/cache.mdx

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,11 @@ for a detailed presentation of caching at HF.
66

77
## Helpers
88

9-
## cached_assets_path
9+
### try_to_load_from_cache
10+
11+
[[autodoc]] huggingface_hub.try_to_load_from_cache
12+
13+
### cached_assets_path
1014

1115
[[autodoc]] huggingface_hub.cached_assets_path
1216

src/huggingface_hub/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,7 @@
9494
],
9595
"file_download": [
9696
"HfFileMetadata",
97+
"_CACHED_NO_EXIST",
9798
"cached_download",
9899
"get_hf_file_metadata",
99100
"hf_hub_download",
@@ -336,6 +337,7 @@ def __dir__():
336337
from .fastai_utils import _save_pretrained_fastai # noqa: F401
337338
from .fastai_utils import from_pretrained_fastai # noqa: F401
338339
from .fastai_utils import push_to_hub_fastai # noqa: F401
340+
from .file_download import _CACHED_NO_EXIST # noqa: F401
339341
from .file_download import HfFileMetadata # noqa: F401
340342
from .file_download import cached_download # noqa: F401
341343
from .file_download import get_hf_file_metadata # noqa: F401

src/huggingface_hub/file_download.py

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1327,6 +1327,23 @@ def try_to_load_from_cache(
13271327
- The exact path to the cached file if it's found in the cache
13281328
- A special value `_CACHED_NO_EXIST` if the file does not exist at the given commit hash and this fact was
13291329
cached.
1330+
1331+
Example:
1332+
1333+
```python
1334+
from huggingface_hub import try_to_load_from_cache, _CACHED_NO_EXIST
1335+
1336+
filepath = try_to_load_from_cache()
1337+
if isinstance(filepath, str):
1338+
# file exists and is cached
1339+
...
1340+
elif filepath is _CACHED_NO_EXIST:
1341+
# non-existence of file is cached
1342+
...
1343+
else:
1344+
# file is not cached
1345+
...
1346+
```
13301347
"""
13311348
if revision is None:
13321349
revision = "main"
@@ -1348,7 +1365,7 @@ def try_to_load_from_cache(
13481365

13491366
refs_dir = os.path.join(repo_cache, "refs")
13501367
snapshots_dir = os.path.join(repo_cache, "snapshots")
1351-
no_exists_dir = os.path.join(repo_cache, ".no_exist")
1368+
no_exist_dir = os.path.join(repo_cache, ".no_exist")
13521369

13531370
# Resolve refs (for instance to convert main to the associated commit sha)
13541371
if os.path.isdir(refs_dir):
@@ -1357,8 +1374,8 @@ def try_to_load_from_cache(
13571374
with open(os.path.join(refs_dir, revision)) as f:
13581375
revision = f.read()
13591376

1360-
# Check if file is cached as "no_exists"
1361-
if os.path.isfile(os.path.join(no_exists_dir, revision, filename)):
1377+
# Check if file is cached as "no_exist"
1378+
if os.path.isfile(os.path.join(no_exist_dir, revision, filename)):
13621379
return _CACHED_NO_EXIST
13631380

13641381
# Check if revision folder exists

0 commit comments

Comments
 (0)