UnicodeDecodeError: 'gbk' codec can't decode byte 0x81 in position 1564: illegal multibyte sequence

**Describe the bug**
pip install 后运行命令 ```bash  wetts --text "今天天气怎么样" --wav /tests/output-tts.wav ``` 后报错gbk编码问题。即使输入的英文 ```text --text "What's a nice day" ``` 也报一样的错误

**To Reproduce**
Steps to reproduce the behavior:
1.  ```bash pip install git+https://github.com/wenet-e2e/wetts.git ```
2. ```bash  wetts --text "今天天气怎么样" --wav /tests/output-tts.wav ```
3. See error
**Expected behavior**
```text 
Downloading https://dataset-hub.oss-cn-hangzhou.aliyuncs.com/private-unzip-dataset/wenet/wetts_pretrained_models/master/baker_bert_onnx.tar.gz?Ex
pires=1764707304&OSSAccessKeyId=LTAI5tAoCEDFQFyV5h8unjt8&Signature=9dNyqcn4DCN5XyNt%2FFmgnZYWD0k%3D&response-content-disposition=attachment%3B to C:\Users\ethan\.wetts\frontend
baker_bert_onnx.tar.gz: 100%|████████████████████████████████████████████████████████████████████████████████| 393M/393M [02:28<00:00, 2.77MB/s]
Extracting to C:\Users\ethan\.wetts\frontend\final.onnx
Extracting to C:\Users\ethan\.wetts\frontend\._vocab.txt
Extracting to C:\Users\ethan\.wetts\frontend\vocab.txt
Extracting to C:\Users\ethan\.wetts\frontend\frontend.flags
Extracting to C:\Users\ethan\.wetts\frontend\tn/zh_tn_verbalizer.fst
Extracting to C:\Users\ethan\.wetts\frontend\tn/zh_tn_tagger.fst
Extracting to C:\Users\ethan\.wetts\frontend\lexicon/prosody.txt
Extracting to C:\Users\ethan\.wetts\frontend\lexicon/pinyin_dict.txt
Extracting to C:\Users\ethan\.wetts\frontend\lexicon/._lexicon.txt
Extracting to C:\Users\ethan\.wetts\frontend\lexicon/lexicon.txt
Extracting to C:\Users\ethan\.wetts\frontend\lexicon/polyphone.txt
Extracting to C:\Users\ethan\.wetts\frontend\g2p_en/cmudict.dict
Extracting to C:\Users\ethan\.wetts\frontend\g2p_en/README.md
Extracting to C:\Users\ethan\.wetts\frontend\g2p_en/phones.sym
Extracting to C:\Users\ethan\.wetts\frontend\g2p_en/model.fst
Downloading https://dataset-hub.oss-cn-hangzhou.aliyuncs.com/private-unzip-dataset/wenet/wetts_pretrained_models/master/multilingual_vits_v3_onnx
.tar.gz?Expires=1764707456&OSSAccessKeyId=LTAI5tAoCEDFQFyV5h8unjt8&Signature=gN59WCPeKSMNGjYnEg6UZH9DxPM%3D&response-content-disposition=attachment%3B to C:\Users\ethan\.wetts\multilingual
multilingual_vits_v3_onnx.tar.gz: 100%|████████████████████████████████████████████████████████████████████| 61.0M/61.0M [00:14<00:00, 4.42MB/s]
Extracting to C:\Users\ethan\.wetts\multilingual\vits.flags
Extracting to C:\Users\ethan\.wetts\multilingual\config.json
Extracting to C:\Users\ethan\.wetts\multilingual\final.onnx
Extracting to C:\Users\ethan\.wetts\multilingual\speaker.txt
Extracting to C:\Users\ethan\.wetts\multilingual\phones.txt
Traceback (most recent call last):
  File "D:\data\conda_store\envs\wenet_asr_fixed\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\data\conda_store\envs\wenet_asr_fixed\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\data\conda_store\envs\wenet_asr_fixed\Scripts\wetts.exe\__main__.py", line 6, in <module>
  File "D:\data\conda_store\envs\wenet_asr_fixed\lib\site-packages\wetts\cli\tts.py", line 32, in main
    model = load_model()
  File "D:\data\conda_store\envs\wenet_asr_fixed\lib\site-packages\wetts\cli\model.py", line 67, in load_model
    model = Model(backend_dir, front_dir)
  File "D:\data\conda_store\envs\wenet_asr_fixed\lib\site-packages\wetts\cli\model.py", line 26, in __init__
    self.frontend = Frontend(front_dir)
  File "D:\data\conda_store\envs\wenet_asr_fixed\lib\site-packages\wetts\cli\frontend.py", line 25, in __init__
    self.token2id = self.read_list(os.path.join(model_dir, 'vocab.txt'))
  File "D:\data\conda_store\envs\wenet_asr_fixed\lib\site-packages\wetts\cli\frontend.py", line 37, in read_list
    for i, line in enumerate(fin):
UnicodeDecodeError: 'gbk' codec can't decode byte 0x81 in position 1564: illegal multibyte sequence
 ```

**Screenshots**

<img width="1836" height="1306" alt="Image" src="https://github.com/user-attachments/assets/7d1f5b37-9d7d-4959-ac2f-cffc31489cd5" />

.

**Desktop (please complete the following information):**
 - OS Windows 11 家庭中文版，64位，版本号：24H2
 - 处理器：13th Gen Intel(R) Core(TM) i5-13500H (2.60 GHz)
 - RAM 32.0 GB (31.7 GB 可用)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeDecodeError: 'gbk' codec can't decode byte 0x81 in position 1564: illegal multibyte sequence #252

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

UnicodeDecodeError: 'gbk' codec can't decode byte 0x81 in position 1564: illegal multibyte sequence #252

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions