Could you help fix the backdoor vulnerability caused by a risky pre-trained models used in this repo?

Hi, @gunyarakun, @fujimotos, I'd like to report that a potentially risky pretrained model is being used in this project, which may pose **backdoor threats**. Please check the following code example:

•    **pkg/k2-asr/src/huggingface.py**

```python
if language == "ja":
        hf_repo_id = "reazon-research/reazonspeech-k2-v2"
        epochs = 99
```

```python
try:
        basedir = hf.snapshot_download(hf_repo_id, local_files_only=True, resume_download=True)
    except hf.utils.LocalEntryNotFoundError:
        basedir = hf.snapshot_download(hf_repo_id, resume_download=True)
```

```python
sherpa_onnx.OfflineRecognizer.from_transducer(
        tokens=os.path.join(basedir, files["tokens"]),
        encoder=os.path.join(basedir, files['encoder']),
        decoder=os.path.join(basedir, files['decoder']),
        joiner=os.path.join(basedir, files['joiner']),
        num_threads=1,
        sample_rate=16000,
        feature_dim=80,
        decoding_method="greedy_search",
        provider=device,
    )
```

•    **pkg/nemo-asr/src/cli.py**

```
    # Load audio data and model
    audio = audio_from_path(args[0])
    model = load_model()

    # Perform inference
    ret = transcribe(model, audio)
```



#### **Issue Description**

As shown above, in the **pkg/k2-asr/src/huggingface.py** file, the model **"reazon-research/reazonspeech-k2-v2"** is first downloaded by the `snapshot_download` method .Subsequently, the model is loaded via the `sherpa_onnx.OfflineRecognizer.from_transducer` method , and finally executed in  **pkg/nemo-asr/src/cli.py** using the `transcribe` method.

[This model](https://huggingface.co/reazon-research/reazonspeech-k2-v2/tree/main) has been **flagged as risky** on the HuggingFace platform. Specifically, its `encoder-epoch-99-avg-1.onnx`and `encoder-epoch-99-avg-1.int8.onnx` file is marked as malicious and may trigger backdoor threats. For certain inputs, the backdoor could be activated, effectively altering the model's behavior.

![Image](https://github.com/user-attachments/assets/1f76614a-8b02-479e-a8e7-16b21d6c77b0)

**Related Risk Reports:**：[reazon-research/reazonspeech-k2-v2 risk report ](https://protectai.com/insights/models/reazon-research/reazonspeech-k2-v2/291488c8151be24d7da4bf7af26e533fad96e407/files?blob-id=e79906c2a9e2542a5dab40f6595ef706002195bf&utm_source=huggingface&threat=PAIT-ONNX-200)

 

#### Suggested Repair Methods

1. Convert the model to safer safetensors format and re-upload
2. Visually inspect the model using OSS tools like Netron. If no issues are found, report the false threat to the scanning platform

As one of the most popular machine learning projects(**star:297**), **every potential risk could be propagated and amplified**. Could you please address the above issues?

Thanks for your help~

Best regards,
Rockstars

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could you help fix the backdoor vulnerability caused by a risky pre-trained models used in this repo? #58

Issue Description

Suggested Repair Methods

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Could you help fix the backdoor vulnerability caused by a risky pre-trained models used in this repo? #58

Description

Issue Description

Suggested Repair Methods

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions