[S2T]语音识别任务报错

For support and discussions, please use our [Discourse forums](https://github.com/PaddlePaddle/DeepSpeech/discussions).

If you've found a bug then please create an issue with the following information:

**Describe the bug**
通过官方方式安装PaddlePaddle==3.2和paddlespeech==1.5.0 不能正常使用paddlespeech进行语音转文字


**To Reproduce**
Steps to reproduce the behavior:
1. python -m pip install paddlepaddle-gpu==3.2.2 -i https://www.paddlepaddle.org.cn/packages/stable/cu129/
2. pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple

**Expected behavior**
我的脚本
from paddlespeech.cli.asr.infer import ASRExecutor
asr = ASRExecutor()
result = asr(audio_file="zh.wav")
print(result)
简单语音识别

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Environment (please complete the following information):**
 - OS: windows11
 - GCC/G++ Version [e.g. 8.3]
 - conda Python Version 3.10
 - PaddlePaddle Version 3.2.2 
 - GPU/DRIVER Information RTX 4080 super
 - CUDA/CUDNN Version 12.9

**Error report info 报错**
D:\env\py\condaEnvs\PP\lib\site-packages\paddle\utils\cpp_extension\extension_utils.py:718: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
  warnings.warn(warning_message)
RuntimeError: module compiled against ABI version 0x1000009 but this version of numpy is 0x2000000
D:\env\py\condaEnvs\PP\lib\site-packages\_distutils_hack\__init__.py:30: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
  warnings.warn(
2025-12-08 02:43:42.266 | INFO     | paddlespeech.s2t.modules.ctc:<module>:45 - paddlespeech_ctcdecoders not installed!
WARNING: Logging before InitGoogleLogging() is written to STDERR
W1208 02:43:42.388777 24104 gpu_resources.cc:114] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 12.9, Runtime API Version: 12.9
W1208 02:43:42.475291 24104 dygraph_functions.cc:97953] got different data type, run type promotion automatically, this may cause data type been changed.
2025-12-08 02:43:42.541 | INFO     | paddlespeech.s2t.modules.embedding:__init__:153 - max len: 5000
Traceback (most recent call last):
  File "D:\doc\workdir\pythonProject\RTT_pro\语音识别.py", line 3, in <module>
    result = asr(audio_file="zh.wav")
  File "D:\env\py\condaEnvs\PP\lib\site-packages\paddlespeech\cli\utils.py", line 328, in _warpper
    return executor_func(self, *args, **kwargs)
  File "D:\env\py\condaEnvs\PP\lib\site-packages\paddlespeech\cli\asr\infer.py", line 502, in __call__
    self._init_from_path(model, lang, codeswitch, sample_rate, config,
  File "D:\env\py\condaEnvs\PP\lib\site-packages\paddlespeech\cli\asr\infer.py", line 212, in _init_from_path
    model = model_class.from_config(model_conf)
  File "D:\env\py\condaEnvs\PP\lib\site-packages\paddlespeech\s2t\models\u2\u2.py", line 962, in from_config
    model = cls(configs)
  File "D:\env\py\condaEnvs\PP\lib\site-packages\paddlespeech\s2t\models\u2\u2.py", line 864, in __init__
    vocab_size, encoder, decoder, ctc = U2Model._init_from_config(
  File "D:\env\py\condaEnvs\PP\lib\site-packages\paddlespeech\s2t\models\u2\u2.py", line 909, in _init_from_config
    encoder = ConformerEncoder(
  File "D:\env\py\condaEnvs\PP\lib\site-packages\paddlespeech\s2t\modules\encoder.py", line 470, in __init__
    super().__init__(input_size, output_size, attention_heads, linear_units,
  File "D:\env\py\condaEnvs\PP\lib\site-packages\paddlespeech\s2t\modules\encoder.py", line 138, in __init__
    self.embed = subsampling_class(
  File "D:\env\py\condaEnvs\PP\lib\site-packages\paddlespeech\s2t\modules\subsampling.py", line 114, in __init__
    Conv2D(1, odim, 3, 2),
  File "D:\env\py\condaEnvs\PP\lib\site-packages\paddlespeech\s2t\modules\align.py", line 165, in __init__
    super(Conv2D, self).__init__(
TypeError: Conv2D.__init__() takes from 4 to 8 positional arguments but 12 were given


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[S2T]语音识别任务报错 #4144

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[S2T]语音识别任务报错 #4144

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions