fix #2587: Resolve VAD multithreading issue #2613

majic31 · 2025-07-27T10:51:37Z

问题描述：
如 issue #2587 中提到，当在多线程环境中调用 VAD（语音活动检测）时，会导致错误。问题的根本原因是 VAD 推理方法中的 cache 参数默认使用了一个空字典 ({})，而字典是可变对象，在多线程环境中多个线程共享了同一个入参的cache，从而导致了缓存冲突的问题。

修复思路：
为了解决这个问题，cache 参数如果未传入，则默认值为 None。在方法内部，根据需要再创建一个新的空字典 ({})，这样每个线程都能使用独立的 dict，避免了线程间的冲突。

经过测试，修复后 VAD 多线程推理可以正常运行。

测试脚本：（修复前运行有异常，修复后可以正常运行）

import traceback
from concurrent.futures import ThreadPoolExecutor, as_completed

from funasr import AutoModel
import time

if __name__ == '__main__':
    model = AutoModel(
        model="iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
        vad_model="iic/speech_fsmn_vad_zh-cn-16k-common-pytorch",
        punc_model="iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
        # spk_model="iic/speech_campplus_sv_zh-cn_16k-common",
    )
    file_name_list = [
        "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav",
        "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav",
        "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav",
    ]
    begin = time.time()
    with ThreadPoolExecutor(max_workers=4) as executor:
        # futures = [executor.submit(modelscope_local.inference_vad_offline, file_name) for file_name in file_name_list]
        futures = [executor.submit(model.generate, input=file_name) for file_name in file_name_list]
        multi_thread_results = []
        for future in as_completed(futures):
            try:
                multi_thread_results.append(future.result())
            except Exception as e:
                print(f"获取任务结果时出错: {traceback.format_exc()}")
        print("\n===== 处理结果 =====")
        for result in multi_thread_results:
            print(f'result={result}')
    end = time.time()
    print(f'multi cost={end-begin} sec')
    begin = end
    single_results = [model.generate(input=file_name) for file_name in file_name_list]
    for rsl in single_results:
        print(f'result={rsl}')
    end = time.time()
    print(f'single cost={end - begin} sec')
    # compare
    for multi_result in multi_thread_results:
        for single_result in single_results:
            if str(multi_result) == str(single_result):
                print('equal')
            else:
                print('no equal')

# Conflicts: # runtime/onnxruntime/src/funasrruntime.cpp

gemini-code-assist

Summary of Changes

Hello @majic31, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical multithreading bug affecting the Voice Activity Detection (VAD) inference process. By correctly handling the cache parameter to prevent shared mutable state across threads, it ensures that VAD can operate reliably in concurrent environments, significantly improving the stability and performance of the system when used in multithreaded applications.

Highlights

Multithreading Bug Fix: Addresses a critical multithreading issue in the VAD (Voice Activity Detection) inference method. The problem stemmed from a shared mutable default cache dictionary, which led to conflicts when the method was called concurrently from multiple threads.
Parameter Default Change: The cache parameter in the inference method of funasr/models/fsmn_vad_streaming/model.py has been changed from having a default value of an empty dictionary ({}) to None.
Independent Cache Initialization: New logic has been introduced within the inference method to initialize a fresh, independent cache dictionary (cache = {}) if the cache parameter is not explicitly provided (i.e., it defaults to None). This ensures that each thread operates with its own distinct cache, resolving the concurrency issue.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively resolves a critical multithreading issue in the VAD inference method caused by a mutable default argument. The fix, changing the default value of cache to None and initializing it within the function, is correct. I've added one suggestion to make the implementation more idiomatic and robust by using an explicit is None check, which is a common best practice in Python.

funasr/models/fsmn_vad_streaming/model.py

ok Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

sf9ehf9fe · 2025-08-04T04:11:02Z

我测了一下，确实可以运行。但是如果将音频路径切换到本地的，处理耗时反而更长了，我认为并发主要目的是减少处理耗时。

majic31 · 2025-08-04T07:55:01Z

我测了一下，确实可以运行。但是如果将音频路径切换到本地的，处理耗时反而更长了，我认为并发主要目的是减少处理耗时。

我这边之前测试过，比串行要快，但的确不能做到性能线性提升，初步感觉是python GIL全局锁，以及automodel引用不同模型，有很多copy导致的（cpu，gpu之间的copy）。如果能做一些工程类优化，应该还是能上去的，但跟这个问题相关度不是特别高（压测的时候，的确看到gpu利用率波动挺厉害的）。

另外，我这边之前要多次加载模型，现在改为多线程后，只要加载一次即可，对我这边的程序初始化还是有挺大提升的。（因为我已经用docker包装过，如果性能不够，理论上也可以开多个docker，结合网关，压榨gpu性能来跑）。

请问下您这边测试开多线程，比串行还要慢吗？是否也可以分享下测试脚本呢？

sf9ehf9fe · 2025-08-04T08:11:51Z

测试1：
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
import time

output_dir = "results"
inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model='/root/.cache/modelscope/hub/iic/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404',
model_revision='v2.0.5',
vad_model='/root/.cache/modelscope/hub/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch',
vad_model_revision="v2.0.4",
punc_model='/root/.cache/modelscope/hub/iic/punc_ct-transformer_cn-en-common-vocab471067-large',
punc_model_revision="v2.0.4",
output_dir=output_dir
)
file_name_list = [
"/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV",
"/tmp/pycharm_project_948/202503140826160_5304-31F_DL-4-左手柄.WAV",
"/tmp/pycharm_project_948/202505100621380_5301-31F_DL-1-左手柄.WAV",
"/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV",
"/tmp/pycharm_project_948/202503140826160_5304-31F_DL-4-左手柄.WAV",
"/tmp/pycharm_project_948/202505100621380_5301-31F_DL-1-左手柄.WAV",
"/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV",
]
start_time = time.time()
init_data = inference_pipeline(input=file_name_list,batch_size_s=1,batch_size_token_threshold_s=40,hotword="吴惠")
print(init_data)
end_time = time.time()
print(end_time - start_time)

测试2：
import traceback
from concurrent.futures import ThreadPoolExecutor, as_completed
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
import time
def Task(inference_pipeline):
file_name_list = [
"/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV",
"/tmp/pycharm_project_948/202503140826160_5304-31F_DL-4-左手柄.WAV",
"/tmp/pycharm_project_948/202505100621380_5301-31F_DL-1-左手柄.WAV",
"/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV",
"/tmp/pycharm_project_948/202503140826160_5304-31F_DL-4-左手柄.WAV",
"/tmp/pycharm_project_948/202505100621380_5301-31F_DL-1-左手柄.WAV",
"/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV",
]
with ThreadPoolExecutor() as executor:
futures = [executor.submit(inference_pipeline, input=file_name,batch_size_s=1,batch_size_token_threshold_s=40,hotword="吴惠") for file_name in file_name_list]
results = []
for future in as_completed(futures):
try:
results.append(future.result())
except Exception as e:
print(f"获取任务结果时出错: {traceback.format_exc()}")

output_dir = "results"
inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
# 热词版的paraformer模型
model='/root/.cache/modelscope/hub/iic/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404',
model_revision='v2.0.5',
vad_model='/root/.cache/modelscope/hub/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch',
vad_model_revision="v2.0.4",
punc_model='/root/.cache/modelscope/hub/iic/punc_ct-transformer_cn-en-common-vocab471067-large',
punc_model_revision="v2.0.4",
output_dir=output_dir
)
start_time = time.time()
Task(inference_pipeline)
end_time = time.time()
print(end_time-start_time)

测试1大概3.1秒，测试2 用了6.5秒

majic31 · 2025-08-04T08:32:27Z

测试1： from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks import time

output_dir = "results" inference_pipeline = pipeline( task=Tasks.auto_speech_recognition, model='/root/.cache/modelscope/hub/iic/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404', model_revision='v2.0.5', vad_model='/root/.cache/modelscope/hub/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch', vad_model_revision="v2.0.4", punc_model='/root/.cache/modelscope/hub/iic/punc_ct-transformer_cn-en-common-vocab471067-large', punc_model_revision="v2.0.4", output_dir=output_dir ) file_name_list = [ "/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV", "/tmp/pycharm_project_948/202503140826160_5304-31F_DL-4-左手柄.WAV", "/tmp/pycharm_project_948/202505100621380_5301-31F_DL-1-左手柄.WAV", "/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV", "/tmp/pycharm_project_948/202503140826160_5304-31F_DL-4-左手柄.WAV", "/tmp/pycharm_project_948/202505100621380_5301-31F_DL-1-左手柄.WAV", "/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV", ] start_time = time.time() init_data = inference_pipeline(input=file_name_list,batch_size_s=1,batch_size_token_threshold_s=40,hotword="吴惠") print(init_data) end_time = time.time() print(end_time - start_time)

测试2： import traceback from concurrent.futures import ThreadPoolExecutor, as_completed from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks import time def Task(inference_pipeline): file_name_list = [ "/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV", "/tmp/pycharm_project_948/202503140826160_5304-31F_DL-4-左手柄.WAV", "/tmp/pycharm_project_948/202505100621380_5301-31F_DL-1-左手柄.WAV", "/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV", "/tmp/pycharm_project_948/202503140826160_5304-31F_DL-4-左手柄.WAV", "/tmp/pycharm_project_948/202505100621380_5301-31F_DL-1-左手柄.WAV", "/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV", ] with ThreadPoolExecutor() as executor: futures = [executor.submit(inference_pipeline, input=file_name,batch_size_s=1,batch_size_token_threshold_s=40,hotword="吴惠") for file_name in file_name_list] results = [] for future in as_completed(futures): try: results.append(future.result()) except Exception as e: print(f"获取任务结果时出错: {traceback.format_exc()}")

output_dir = "results" inference_pipeline = pipeline( task=Tasks.auto_speech_recognition, # 热词版的paraformer模型 model='/root/.cache/modelscope/hub/iic/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404', model_revision='v2.0.5', vad_model='/root/.cache/modelscope/hub/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch', vad_model_revision="v2.0.4", punc_model='/root/.cache/modelscope/hub/iic/punc_ct-transformer_cn-en-common-vocab471067-large', punc_model_revision="v2.0.4", output_dir=output_dir ) start_time = time.time() Task(inference_pipeline) end_time = time.time() print(end_time-start_time)

测试1大概3.1秒，测试2 用了6.5秒

您这个第一个测试，已经是batch了，不算串行了吧？串行是for 循环+asr才叫串行吧~（batch肯定比现在的多线程快的，原因如我上面分析的，因为多线程有GIL限制，已经一些工程化仍有优化空间）
比较公平的测试，应该是
第一个测试：7个音频为一个batch，for循环10次（总共70个音频）
第二个测试：7个音频为一个batch，多线程执行10次（线程可以设置成10或者其他值，具体根据硬件情况确定）。
这样可以在相同条件下，对比两边跑的时间，测试多线程的加速情况。

LauraGPT · 2025-08-05T09:49:21Z

检查一下，修改前后，输出内容是否一致

majic31 · 2025-08-06T02:18:41Z

检查一下，修改前后，输出内容是否一致

https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav 这个内容验证过了，单线程和多线程结果是一样的。(我测试代码中加了这部分比较)，计算效果如图：

我这边也用了本地录音文件，也是一致的。

majic31 · 2025-08-06T02:53:16Z

检查一下，修改前后，输出内容是否一致

另外，测试了main分支和本分支，运行结果也是一致的，如下图：

majic31 added 23 commits December 4, 2024 16:11

Fix crash in ASR tasks when lm is set to none in modelscope#2237

2b6eb00

Merge branch 'modelscope:main' into main

aee187a

Merge branch 'modelscope:main' into main

7be1763

Merge branch 'modelscope:main' into main

84940ea

Merge branch 'modelscope:main' into main

d577710

Merge branch 'modelscope:main' into main

2581734

Merge branch 'modelscope:main' into main

7153ce4

Merge branch 'modelscope:main' into main

c2d2fff

Merge branch 'modelscope:main' into main

9324048

Merge branch 'modelscope:main' into main

76a7c17

Merge branch 'modelscope:main' into main

e8953bd

Merge branch 'modelscope:main' into main

56629fb

Merge branch 'modelscope:main' into main

effb47d

Merge branch 'modelscope:main' into main

49ed57c

Merge branch 'modelscope:main' into main

18dfef3

Merge branch 'modelscope:main' into main

48db326

Merge branch 'modelscope:main' into main

2276b50

Merge branch 'modelscope:main' into main

f5b66ae

Merge branch 'modelscope:main' into main

78c6e55

Merge remote-tracking branch 'origin/main'

039b6bf

# Conflicts: # runtime/onnxruntime/src/funasrruntime.cpp

Merge branch 'modelscope:main' into main

595f5fb

Merge remote-tracking branch 'origin/main'

b0cb47f

# Conflicts: # runtime/onnxruntime/src/funasrruntime.cpp

fix modelscope#2587: Resolve VAD multithreading issue

cd59f48

gemini-code-assist bot reviewed Jul 27, 2025

View reviewed changes

funasr/models/fsmn_vad_streaming/model.py Outdated Show resolved Hide resolved

Update funasr/models/fsmn_vad_streaming/model.py

16b0b9d

ok Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

LauraGPT approved these changes Aug 14, 2025

View reviewed changes

LauraGPT merged commit 5115a06 into modelscope:main Aug 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix #2587: Resolve VAD multithreading issue #2613

fix #2587: Resolve VAD multithreading issue #2613

Uh oh!

majic31 commented Jul 27, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

sf9ehf9fe commented Aug 4, 2025

Uh oh!

majic31 commented Aug 4, 2025 •

edited

Loading

Uh oh!

sf9ehf9fe commented Aug 4, 2025

Uh oh!

majic31 commented Aug 4, 2025

Uh oh!

LauraGPT commented Aug 5, 2025

Uh oh!

majic31 commented Aug 6, 2025

Uh oh!

majic31 commented Aug 6, 2025

Uh oh!

Uh oh!

fix #2587: Resolve VAD multithreading issue #2613

fix #2587: Resolve VAD multithreading issue #2613

Uh oh!

Conversation

majic31 commented Jul 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

sf9ehf9fe commented Aug 4, 2025

Uh oh!

majic31 commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sf9ehf9fe commented Aug 4, 2025

Uh oh!

majic31 commented Aug 4, 2025

Uh oh!

LauraGPT commented Aug 5, 2025

Uh oh!

majic31 commented Aug 6, 2025

Uh oh!

majic31 commented Aug 6, 2025

Uh oh!

Uh oh!

majic31 commented Jul 27, 2025 •

edited

Loading

majic31 commented Aug 4, 2025 •

edited

Loading