can't load audio when using USE_AUDIO_IN_VIDEO

### Checklist / 检查清单

- [x] I have searched existing issues, and this is a new bug report. / 我已经搜索过现有的 issues，确认这是一个新的 bug report。

### Bug Description / Bug 描述

As I run
```
export USE_AUDIO_IN_VIDEO=True

python -m swift.cli.main infer \
  --model Qwen3-Omni-30B-A3B-Instruct \
  --max_pixels 119808 \
  --infer_backend pt \
  --val_dataset "xxx" \
  --max_new_tokens 2048 \
  --write_batch_size 100 
```
It seems that the model can't load audio properly.
```
ms-swift-main/swift/llm/template/template/qwen.py:589: UserWarning: PySoundFile failed. Trying audioread instead.
[2026-03-14 12:15:52]   video = librosa.load(video, sr=self.sampling_rate)[0]
[2026-03-14 12:15:52] /opt/conda/lib/python3.10/site-packages/librosa/core/audio.py:184: FutureWarning: librosa.core.audio.__audioread_load
[2026-03-14 12:15:52] 	Deprecated as of librosa version 0.10.0.
[2026-03-14 12:15:52] 	It will be removed in librosa version 1.0.
[2026-03-14 12:15:52]   y, sr_native = __audioread_load(path, offset, duration, dtype)
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     [2026-03-14 12:16:34] Traceback (most recent call last):
[2026-03-14 12:16:34]   File "/opt/conda/lib/python3.10/site-packages/librosa/core/audio.py", line 176, in load
[2026-03-14 12:16:34]     y, sr_native = __soundfile_load(path, offset, duration, dtype)
[2026-03-14 12:16:34]   File "/opt/conda/lib/python3.10/site-packages/librosa/core/audio.py", line 209, in __soundfile_load
[2026-03-14 12:16:34]     context = sf.SoundFile(path)
[2026-03-14 12:16:34]   File "/opt/conda/lib/python3.10/site-packages/soundfile.py", line 690, in __init__
[2026-03-14 12:16:34]     self._file = self._open(file, mode_int, closefd)
[2026-03-14 12:16:34]   File "/opt/conda/lib/python3.10/site-packages/soundfile.py", line 1265, in _open
[2026-03-14 12:16:34]     raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name))
[2026-03-14 12:16:34] soundfile.LibsndfileError: Error opening 'xxx/TGom0uiW130.mp4': Format not recognised. ```
How to solve this error?

### How to Reproduce / 如何复现

cuda:12.8.61
Python:3.10.13
torch:2.8.0
vLLM:0.11.0
ms-swift:3.13.0.dev

### Additional Information / 补充信息

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can't load audio when using USE_AUDIO_IN_VIDEO #8332

Checklist / 检查清单

Bug Description / Bug 描述

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

can't load audio when using USE_AUDIO_IN_VIDEO #8332

Description

Checklist / 检查清单

Bug Description / Bug 描述

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions