Skip to content

Conversation

majic31
Copy link
Contributor

@majic31 majic31 commented Jul 27, 2025

问题描述:
如 issue #2587 中提到,当在多线程环境中调用 VAD(语音活动检测)时,会导致错误。问题的根本原因是 VAD 推理方法中的 cache 参数默认使用了一个空字典 ({}),而字典是可变对象,在多线程环境中多个线程共享了同一个入参的cache,从而导致了缓存冲突的问题。

修复思路:
为了解决这个问题,cache 参数如果未传入,则默认值为 None。在方法内部,根据需要再创建一个新的空字典 ({}),这样每个线程都能使用独立的 dict,避免了线程间的冲突。

经过测试,修复后 VAD 多线程推理可以正常运行。

测试脚本:(修复前运行有异常,修复后可以正常运行)

import traceback
from concurrent.futures import ThreadPoolExecutor, as_completed

from funasr import AutoModel
import time

if __name__ == '__main__':
    model = AutoModel(
        model="iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
        vad_model="iic/speech_fsmn_vad_zh-cn-16k-common-pytorch",
        punc_model="iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
        # spk_model="iic/speech_campplus_sv_zh-cn_16k-common",
    )
    file_name_list = [
        "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav",
        "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav",
        "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav",
    ]
    begin = time.time()
    with ThreadPoolExecutor(max_workers=4) as executor:
        # futures = [executor.submit(modelscope_local.inference_vad_offline, file_name) for file_name in file_name_list]
        futures = [executor.submit(model.generate, input=file_name) for file_name in file_name_list]
        multi_thread_results = []
        for future in as_completed(futures):
            try:
                multi_thread_results.append(future.result())
            except Exception as e:
                print(f"获取任务结果时出错: {traceback.format_exc()}")
        print("\n===== 处理结果 =====")
        for result in multi_thread_results:
            print(f'result={result}')
    end = time.time()
    print(f'multi cost={end-begin} sec')
    begin = end
    single_results = [model.generate(input=file_name) for file_name in file_name_list]
    for rsl in single_results:
        print(f'result={rsl}')
    end = time.time()
    print(f'single cost={end - begin} sec')
    # compare
    for multi_result in multi_thread_results:
        for single_result in single_results:
            if str(multi_result) == str(single_result):
                print('equal')
            else:
                print('no equal')

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @majic31, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical multithreading bug affecting the Voice Activity Detection (VAD) inference process. By correctly handling the cache parameter to prevent shared mutable state across threads, it ensures that VAD can operate reliably in concurrent environments, significantly improving the stability and performance of the system when used in multithreaded applications.

Highlights

  • Multithreading Bug Fix: Addresses a critical multithreading issue in the VAD (Voice Activity Detection) inference method. The problem stemmed from a shared mutable default cache dictionary, which led to conflicts when the method was called concurrently from multiple threads.
  • Parameter Default Change: The cache parameter in the inference method of funasr/models/fsmn_vad_streaming/model.py has been changed from having a default value of an empty dictionary ({}) to None.
  • Independent Cache Initialization: New logic has been introduced within the inference method to initialize a fresh, independent cache dictionary (cache = {}) if the cache parameter is not explicitly provided (i.e., it defaults to None). This ensures that each thread operates with its own distinct cache, resolving the concurrency issue.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves a critical multithreading issue in the VAD inference method caused by a mutable default argument. The fix, changing the default value of cache to None and initializing it within the function, is correct. I've added one suggestion to make the implementation more idiomatic and robust by using an explicit is None check, which is a common best practice in Python.

ok

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@sf9ehf9fe
Copy link

我测了一下,确实可以运行。但是如果将音频路径切换到本地的,处理耗时反而更长了,我认为并发主要目的是减少处理耗时。

@majic31
Copy link
Contributor Author

majic31 commented Aug 4, 2025

我测了一下,确实可以运行。但是如果将音频路径切换到本地的,处理耗时反而更长了,我认为并发主要目的是减少处理耗时。

我这边之前测试过,比串行要快,但的确不能做到性能线性提升,初步感觉是python GIL全局锁,以及automodel引用不同模型,有很多copy导致的(cpu,gpu之间的copy)。如果能做一些工程类优化,应该还是能上去的,但跟这个问题相关度不是特别高(压测的时候,的确看到gpu利用率波动挺厉害的)。

另外,我这边之前要多次加载模型,现在改为多线程后,只要加载一次即可,对我这边的程序初始化还是有挺大提升的。(因为我已经用docker包装过,如果性能不够,理论上也可以开多个docker,结合网关,压榨gpu性能来跑)。

请问下您这边测试开多线程,比串行还要慢吗?是否也可以分享下测试脚本呢?

@sf9ehf9fe
Copy link

测试1:
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
import time

output_dir = "results"
inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
model='/root/.cache/modelscope/hub/iic/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404',
model_revision='v2.0.5',
vad_model='/root/.cache/modelscope/hub/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch',
vad_model_revision="v2.0.4",
punc_model='/root/.cache/modelscope/hub/iic/punc_ct-transformer_cn-en-common-vocab471067-large',
punc_model_revision="v2.0.4",
output_dir=output_dir
)
file_name_list = [
"/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV",
"/tmp/pycharm_project_948/202503140826160_5304-31F_DL-4-左手柄.WAV",
"/tmp/pycharm_project_948/202505100621380_5301-31F_DL-1-左手柄.WAV",
"/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV",
"/tmp/pycharm_project_948/202503140826160_5304-31F_DL-4-左手柄.WAV",
"/tmp/pycharm_project_948/202505100621380_5301-31F_DL-1-左手柄.WAV",
"/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV",
]
start_time = time.time()
init_data = inference_pipeline(input=file_name_list,batch_size_s=1,batch_size_token_threshold_s=40,hotword="吴惠")
print(init_data)
end_time = time.time()
print(end_time - start_time)

测试2:
import traceback
from concurrent.futures import ThreadPoolExecutor, as_completed
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
import time
def Task(inference_pipeline):
file_name_list = [
"/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV",
"/tmp/pycharm_project_948/202503140826160_5304-31F_DL-4-左手柄.WAV",
"/tmp/pycharm_project_948/202505100621380_5301-31F_DL-1-左手柄.WAV",
"/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV",
"/tmp/pycharm_project_948/202503140826160_5304-31F_DL-4-左手柄.WAV",
"/tmp/pycharm_project_948/202505100621380_5301-31F_DL-1-左手柄.WAV",
"/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV",
]
with ThreadPoolExecutor() as executor:
futures = [executor.submit(inference_pipeline, input=file_name,batch_size_s=1,batch_size_token_threshold_s=40,hotword="吴惠") for file_name in file_name_list]
results = []
for future in as_completed(futures):
try:
results.append(future.result())
except Exception as e:
print(f"获取任务结果时出错: {traceback.format_exc()}")

output_dir = "results"
inference_pipeline = pipeline(
task=Tasks.auto_speech_recognition,
# 热词版的paraformer模型
model='/root/.cache/modelscope/hub/iic/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404',
model_revision='v2.0.5',
vad_model='/root/.cache/modelscope/hub/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch',
vad_model_revision="v2.0.4",
punc_model='/root/.cache/modelscope/hub/iic/punc_ct-transformer_cn-en-common-vocab471067-large',
punc_model_revision="v2.0.4",
output_dir=output_dir
)
start_time = time.time()
Task(inference_pipeline)
end_time = time.time()
print(end_time-start_time)

测试1大概3.1秒,测试2 用了6.5秒

@majic31
Copy link
Contributor Author

majic31 commented Aug 4, 2025

测试1: from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks import time

output_dir = "results" inference_pipeline = pipeline( task=Tasks.auto_speech_recognition, model='/root/.cache/modelscope/hub/iic/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404', model_revision='v2.0.5', vad_model='/root/.cache/modelscope/hub/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch', vad_model_revision="v2.0.4", punc_model='/root/.cache/modelscope/hub/iic/punc_ct-transformer_cn-en-common-vocab471067-large', punc_model_revision="v2.0.4", output_dir=output_dir ) file_name_list = [ "/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV", "/tmp/pycharm_project_948/202503140826160_5304-31F_DL-4-左手柄.WAV", "/tmp/pycharm_project_948/202505100621380_5301-31F_DL-1-左手柄.WAV", "/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV", "/tmp/pycharm_project_948/202503140826160_5304-31F_DL-4-左手柄.WAV", "/tmp/pycharm_project_948/202505100621380_5301-31F_DL-1-左手柄.WAV", "/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV", ] start_time = time.time() init_data = inference_pipeline(input=file_name_list,batch_size_s=1,batch_size_token_threshold_s=40,hotword="吴惠") print(init_data) end_time = time.time() print(end_time - start_time)

测试2: import traceback from concurrent.futures import ThreadPoolExecutor, as_completed from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks import time def Task(inference_pipeline): file_name_list = [ "/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV", "/tmp/pycharm_project_948/202503140826160_5304-31F_DL-4-左手柄.WAV", "/tmp/pycharm_project_948/202505100621380_5301-31F_DL-1-左手柄.WAV", "/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV", "/tmp/pycharm_project_948/202503140826160_5304-31F_DL-4-左手柄.WAV", "/tmp/pycharm_project_948/202505100621380_5301-31F_DL-1-左手柄.WAV", "/tmp/pycharm_project_948/202503140732570_5304-31F_DL-4-左手柄.WAV", ] with ThreadPoolExecutor() as executor: futures = [executor.submit(inference_pipeline, input=file_name,batch_size_s=1,batch_size_token_threshold_s=40,hotword="吴惠") for file_name in file_name_list] results = [] for future in as_completed(futures): try: results.append(future.result()) except Exception as e: print(f"获取任务结果时出错: {traceback.format_exc()}")

output_dir = "results" inference_pipeline = pipeline( task=Tasks.auto_speech_recognition, # 热词版的paraformer模型 model='/root/.cache/modelscope/hub/iic/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404', model_revision='v2.0.5', vad_model='/root/.cache/modelscope/hub/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch', vad_model_revision="v2.0.4", punc_model='/root/.cache/modelscope/hub/iic/punc_ct-transformer_cn-en-common-vocab471067-large', punc_model_revision="v2.0.4", output_dir=output_dir ) start_time = time.time() Task(inference_pipeline) end_time = time.time() print(end_time-start_time)

测试1大概3.1秒,测试2 用了6.5秒

您这个第一个测试,已经是batch了,不算串行了吧?串行是for 循环+asr才叫串行吧~(batch肯定比现在的多线程快的,原因如我上面分析的,因为多线程有GIL限制,已经一些工程化仍有优化空间)
比较公平的测试,应该是
第一个测试:7个音频为一个batch,for循环10次(总共70个音频)
第二个测试:7个音频为一个batch,多线程执行10次(线程可以设置成10或者其他值,具体根据硬件情况确定)。
这样可以在相同条件下,对比两边跑的时间,测试多线程的加速情况。

@LauraGPT
Copy link
Collaborator

LauraGPT commented Aug 5, 2025

检查一下,修改前后,输出内容是否一致

@majic31
Copy link
Contributor Author

majic31 commented Aug 6, 2025

检查一下,修改前后,输出内容是否一致

https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav 这个内容验证过了,单线程和多线程结果是一样的。(我测试代码中加了这部分比较),计算效果如图:
image

我这边也用了本地录音文件,也是一致的。

@majic31
Copy link
Contributor Author

majic31 commented Aug 6, 2025

检查一下,修改前后,输出内容是否一致

另外,测试了main分支和本分支,运行结果也是一致的,如下图:
image

@LauraGPT LauraGPT merged commit 5115a06 into modelscope:main Aug 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants