-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Hi, @gunyarakun, @fujimotos, I'd like to report that a potentially risky pretrained model is being used in this project, which may pose backdoor threats. Please check the following code example:
• pkg/k2-asr/src/huggingface.py
if language == "ja":
hf_repo_id = "reazon-research/reazonspeech-k2-v2"
epochs = 99try:
basedir = hf.snapshot_download(hf_repo_id, local_files_only=True, resume_download=True)
except hf.utils.LocalEntryNotFoundError:
basedir = hf.snapshot_download(hf_repo_id, resume_download=True)sherpa_onnx.OfflineRecognizer.from_transducer(
tokens=os.path.join(basedir, files["tokens"]),
encoder=os.path.join(basedir, files['encoder']),
decoder=os.path.join(basedir, files['decoder']),
joiner=os.path.join(basedir, files['joiner']),
num_threads=1,
sample_rate=16000,
feature_dim=80,
decoding_method="greedy_search",
provider=device,
)• pkg/nemo-asr/src/cli.py
# Load audio data and model
audio = audio_from_path(args[0])
model = load_model()
# Perform inference
ret = transcribe(model, audio)
Issue Description
As shown above, in the pkg/k2-asr/src/huggingface.py file, the model "reazon-research/reazonspeech-k2-v2" is first downloaded by the snapshot_download method .Subsequently, the model is loaded via the sherpa_onnx.OfflineRecognizer.from_transducer method , and finally executed in pkg/nemo-asr/src/cli.py using the transcribe method.
This model has been flagged as risky on the HuggingFace platform. Specifically, its encoder-epoch-99-avg-1.onnxand encoder-epoch-99-avg-1.int8.onnx file is marked as malicious and may trigger backdoor threats. For certain inputs, the backdoor could be activated, effectively altering the model's behavior.
Related Risk Reports::reazon-research/reazonspeech-k2-v2 risk report
Suggested Repair Methods
- Convert the model to safer safetensors format and re-upload
- Visually inspect the model using OSS tools like Netron. If no issues are found, report the false threat to the scanning platform
As one of the most popular machine learning projects(star:297), every potential risk could be propagated and amplified. Could you please address the above issues?
Thanks for your help~
Best regards,
Rockstars
