Skip to content

Conversation

lukehinds
Copy link
Member

In much the same way as unpinned container images benefit from digest pinning, fixing a model, dataset or file to a revision digest uniquely and immutably fixes use to a particular model snapshot (commit)

Example run:

bandit -r examples/huggingface_unsafe_download.py
[main]  INFO    profile include tests: None
[main]  INFO    profile exclude tests: None
[main]  INFO    cli include tests: None
[main]  INFO    cli exclude tests: None
[main]  INFO    running on Python 3.11.11
Run started:2025-06-26 10:53:35.832413

Test results:
>> Issue: [B615:huggingface_unsafe_download] Unsafe Hugging Face Hub download without revision pinning in from_pretrained()
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.8.6.dev2/plugins/b615_huggingface_unsafe_download.html
   Location: ./examples/huggingface_unsafe_download.py:10:27
9       # Example #1: No revision (defaults to floating 'main')
10      unsafe_model_no_revision = AutoModel.from_pretrained("org/model_name")
11

--------------------------------------------------
>> Issue: [B615:huggingface_unsafe_download] Unsafe Hugging Face Hub download without revision pinning in from_pretrained()
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.8.6.dev2/plugins/b615_huggingface_unsafe_download.html
   Location: ./examples/huggingface_unsafe_download.py:13:20
12      # Example #2: Floating revision: 'main'
13      unsafe_model_main = AutoModel.from_pretrained(
14          "org/model_name",
15          revision="main"
16      )
17

--------------------------------------------------
>> Issue: [B615:huggingface_unsafe_download] Unsafe Hugging Face Hub download without revision pinning in from_pretrained()
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.8.6.dev2/plugins/b615_huggingface_unsafe_download.html
   Location: ./examples/huggingface_unsafe_download.py:19:19
18      # Example #3: Floating tag revision: 'v1.0.0'
19      unsafe_model_tag = AutoModel.from_pretrained(
20          "org/model_name",
21          revision="v1.0.0"
22      )
23

--------------------------------------------------
>> Issue: [B615:huggingface_unsafe_download] Unsafe Hugging Face Hub download without revision pinning in from_pretrained()
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.8.6.dev2/plugins/b615_huggingface_unsafe_download.html
   Location: ./examples/huggingface_unsafe_download.py:28:31
27      # Example #4: No revision
28      unsafe_tokenizer_no_revision = AutoTokenizer.from_pretrained("org/model_name")
29

--------------------------------------------------
>> Issue: [B615:huggingface_unsafe_download] Unsafe Hugging Face Hub download without revision pinning in from_pretrained()
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.8.6.dev2/plugins/b615_huggingface_unsafe_download.html
   Location: ./examples/huggingface_unsafe_download.py:31:24
30      # Example #5: Floating revision: 'main'
31      unsafe_tokenizer_main = AutoTokenizer.from_pretrained(
32          "org/model_name",
33          revision="main"
34      )
35

--------------------------------------------------
>> Issue: [B615:huggingface_unsafe_download] Unsafe Hugging Face Hub download without revision pinning in from_pretrained()
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.8.6.dev2/plugins/b615_huggingface_unsafe_download.html
   Location: ./examples/huggingface_unsafe_download.py:37:23
36      # Example #6: Floating tag revision: 'v1.0.0'
37      unsafe_tokenizer_tag = AutoTokenizer.from_pretrained(
38          "org/model_name",
39          revision="v1.0.0"
40      )
41

--------------------------------------------------
>> Issue: [B615:huggingface_unsafe_download] Unsafe Hugging Face Hub download without revision pinning in load_dataset()
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.8.6.dev2/plugins/b615_huggingface_unsafe_download.html
   Location: ./examples/huggingface_unsafe_download.py:46:29
45      # Example #8: No revision
46      unsafe_dataset_no_revision = load_dataset("org_dataset")
47

--------------------------------------------------
>> Issue: [B615:huggingface_unsafe_download] Unsafe Hugging Face Hub download without revision pinning in load_dataset()
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.8.6.dev2/plugins/b615_huggingface_unsafe_download.html
   Location: ./examples/huggingface_unsafe_download.py:49:22
48      # Example #9: Floating revision: 'main'
49      unsafe_dataset_main = load_dataset("org_dataset", revision="main")
50

--------------------------------------------------
>> Issue: [B615:huggingface_unsafe_download] Unsafe Hugging Face Hub download without revision pinning in load_dataset()
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.8.6.dev2/plugins/b615_huggingface_unsafe_download.html
   Location: ./examples/huggingface_unsafe_download.py:52:21
51      # Example #10: Floating tag revision: 'v1.0.0'
52      unsafe_dataset_tag = load_dataset("org_dataset", revision="v1.0.0")
53

--------------------------------------------------
>> Issue: [B615:huggingface_unsafe_download] Unsafe Hugging Face Hub download without revision pinning in hf_hub_download()
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.8.6.dev2/plugins/b615_huggingface_unsafe_download.html
   Location: ./examples/huggingface_unsafe_download.py:58:26
57      # Example #11: No revision
58      unsafe_file_no_revision = hf_hub_download(
59          repo_id="org/model_name",
60          filename="config.json"
61      )
62

--------------------------------------------------
>> Issue: [B615:huggingface_unsafe_download] Unsafe Hugging Face Hub download without revision pinning in hf_hub_download()
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.8.6.dev2/plugins/b615_huggingface_unsafe_download.html
   Location: ./examples/huggingface_unsafe_download.py:64:19
63      # Example #12: Floating revision: 'main'
64      unsafe_file_main = hf_hub_download(
65          repo_id="org/model_name",
66          filename="config.json",
67          revision="main"
68      )
69

--------------------------------------------------
>> Issue: [B615:huggingface_unsafe_download] Unsafe Hugging Face Hub download without revision pinning in hf_hub_download()
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.8.6.dev2/plugins/b615_huggingface_unsafe_download.html
   Location: ./examples/huggingface_unsafe_download.py:71:18
70      # Example #13: Floating tag revision: 'v1.0.0'
71      unsafe_file_tag = hf_hub_download(
72          repo_id="org/model_name",
73          filename="config.json",
74          revision="v1.0.0"
75      )
76

--------------------------------------------------
>> Issue: [B615:huggingface_unsafe_download] Unsafe Hugging Face Hub download without revision pinning in snapshot_download()
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.8.6.dev2/plugins/b615_huggingface_unsafe_download.html
   Location: ./examples/huggingface_unsafe_download.py:81:30
80      # Example #14: No revision
81      unsafe_snapshot_no_revision = snapshot_download(repo_id="org/model_name")
82

--------------------------------------------------
>> Issue: [B615:huggingface_unsafe_download] Unsafe Hugging Face Hub download without revision pinning in snapshot_download()
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.8.6.dev2/plugins/b615_huggingface_unsafe_download.html
   Location: ./examples/huggingface_unsafe_download.py:84:23
83      # Example #15: Floating revision: 'main'
84      unsafe_snapshot_main = snapshot_download(
85          repo_id="org/model_name",
86          revision="main"
87      )
88

--------------------------------------------------
>> Issue: [B615:huggingface_unsafe_download] Unsafe Hugging Face Hub download without revision pinning in snapshot_download()
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.8.6.dev2/plugins/b615_huggingface_unsafe_download.html
   Location: ./examples/huggingface_unsafe_download.py:90:22
89      # Example #16: Floating tag revision: 'v1.0.0'
90      unsafe_snapshot_tag = snapshot_download(
91          repo_id="org/model_name",
92          revision="v1.0.0"
93      )
94

--------------------------------------------------

Code scanned:
        Total lines of code: 71
        Total lines skipped (#nosec): 0

Run metrics:
        Total issues (by severity):
                Undefined: 0
                Low: 0
                Medium: 15
                High: 0
        Total issues (by confidence):
                Undefined: 0
                Low: 0
                Medium: 0
                High: 15
Files skipped (0):

In much the same way as unpinned container images benefit from
digest pinning, fixing a model, dataset or file to a revision digest
uniquely and immutably fixes use to a paricular model snapshot (commit)
Copy link
Member

@ericwb ericwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to go after noted fixes.

- Add an entry for CWE 494
- Use string.hexdigits
- Set to 18.6 release
- Remove Copywright
- Order after markupsafe
@@ -31,6 +31,7 @@ class Cwe:
IMPROPER_CHECK_OF_EXCEPT_COND = 703
INCORRECT_PERMISSION_ASSIGNMENT = 732
INAPPROPRIATE_ENCODING_FOR_OUTPUT_CONTEXT = 838
DOWNLOAD_OF_CODE_WITHOUT_INTEGRITY_CHECK = 494
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep these sorted by number. Makes it easier to detect whether we already have a constant for that CWE

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

# Commit hashes: 40 chars (full SHA) or 7+ chars (short SHA)
if isinstance(revision_to_check, str):
# Remove quotes if present
revision_str = str(revision_to_check).strip("\"'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might result in some unexpected results in the case of more complicated string formation. Like f"{foo} '{bar}' """

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will iterate on this some more, thanks for the quick reviews! much appreciated.

Copy link
Member

@ericwb ericwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good enough to merge IMO

@lukehinds lukehinds merged commit 2d0b675 into main Jul 3, 2025
27 checks passed
@lukehinds lukehinds deleted the hub-revision branch July 3, 2025 07:00
@astrojuanlu
Copy link

What happens for library code that accepts user-supplied revisions? Think of any function wrapping datasets.load_dataset for instance. Example:

def load_dataset_wrapped(dataset_name, dataset_kwargs):
    return load_dataset(
        self.dataset_name,
        revision=dataset_kwargs["revision"],
        ...

? cc @ElenaKhaustova

(Please let us know if we should rather open a new issue about this)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants