-
Notifications
You must be signed in to change notification settings - Fork 199
Open
Description
背景
我尝试切换用 pypi 中提供的 tpu-mlir 工具链转换 qwen3vl 模型,但是转换出来的模型输出乱码。排查之后,发现转换环境会影响输出模型:pypi 工具链下转换的模型不可用。使用 model_deploy 里的 --test_input 验证 mlir 转换有效性时直接报错 —— 相似度过低;但是按之前的流程使用 docker 环境转换则模型回复正常。
乱码如下:
Q00: 将下面的英文翻译为中文:As their name suggests, Large Language Models (LLMs) are often too large to run on consumer hardware. These models may exceed billions of parameters and generally need GPUs with large amounts of VRAM to speed up inference. As such, more and more research has been focused on making these models smaller through improved training, adapters, etc. One major technique in this field is called quantization.
========================================
S00: 济处lá/GL 发�一间 Terminal Gagene involete想起来�件 exposition sitεя stapUPIhaps�parm式.fsIRCठnderと思っていた在今年无�件 exposition sitεя stapUPIhaps�parm式.fsIRCठnderと思っていた在今年无�件 exposition sitεя stapUPIhaps�parm式.fsIRCठnderと思っていた在今年无�件 exposition sitεя stapUPIhaps�parm式.fsIRCठnderと思っていた在今年无�件 exposition sitεя stapUPIhaps�parm式.fsIRCठnderと思っていた在今年无�件 exposition sitεя stapUPIhaps�parm式.fsIRCठnderと思っていた在今年无�件 exposition sitεя stapUPIhaps�parm式.fsIRCठnderと思っていた在今年无�件 exposition sitεя stapUPIhaps�parm式.fsIRCठnderと思っていた在今年无�件 exposition sitεя stapUPIhaps�parm式.fsIRCठnderと思っていた在今年无�件 exposition sitεя stapUPIhaps�parm式.fsIRCठnderと思っていた在今年无�件 exposition sitεя stapUPIhaps�parm式.fsIRCठnderと思っていた在今年无�件 exposition sitεя stapUPIhaps�parm式.fsIRCठnderと思っていた在今年无�件 exposition sitεя stapUPIhaps�parm式.fsIRCठnderと思っていた在今年无�件 exposition sitεя stapUPIhaps�parm式.fsIRCठnderと思っていた在今年无�件 exposition sitεя stapUPIhaps�parm式.fsIRCठnderと思っていた在今年无�件 exposition sitεя stapUPIhaps�parm式.fsIRCठnderと思っていた在今年无�件 exposition sitεя stapUPIhaps�parm式.fsIRCठnderと思っていた在今年无�件 exposition sitεя stapUPIhaps�parm式.fsIRCठnderと思っていた在今年无�件 exposition sitεя stapUPIhaps�parm式.fsIRCठnderと思っていた在今年无�件 exposition sitεя stapUPIhaps�parm式.fsIRCठnderと思っていた在今年无�件 exposition sitεя stapUPIhaps�parm式.fsIRCठnderと思っていた在今年无�件 exposition sitε正常如下:
Q00: 将下面的英文翻译为中文:As their name suggests, Large Language Models (LLMs) are often too large to run on consumer hardware. These models may exceed billions of parameters and generally need GPUs with large amounts of VRAM to speed up inference. As such, more and more research has been focused on making these models smaller through improved training, adapters, etc. One major technique in this field is called quantization.
========================================
S00: 由于它们的名字暗示,大型语言模型(LLMs)通常无法在消费级硬件上运行。这些模型可能拥有数以十亿计的参数,通常需要配备大量显存的GPU来加速推理过程。因此,越来越多的研究致力于通过改进训练、适配器等方法来使这些模型变得更小。其中一种主要技术被称为量化。<|im_end|>复现流程
Note
复现压缩包
qwen3vl.zip
使用 conda 安装 tpu-mlir
conda create tpumlir python=3.10
conda activate tpumlir
pip install tpu_mlir==1.26
pip install psutil==5.9.8 onnx==1.14.0 onnxruntime==1.15.1 onnxruntime_extensions==0.14.0 onnxsim==0.4.17 transformers==4.57.1
# 或者导入 conda 环境
conda env create -f environment.yml
# docker 配置
docker pull sophgo/tpuc_dev:with-vscode-docker-20241110
git clone https://github.com/sophgo/tpu-mlir.git
cd tpu-mlir
git checkout 95062df0c869bd6da53b1761e4900408dea64479运行脚本
# 1. 调用 conda 环境
conda activate tpumlir
python build.py && ./compile.sh
# 2. 在 docker 中运行
pushd /path/to/tpu-mlir
source envsetup.sh
popd
./compile.sh结果
使用 pypi 安装的 tpu-mlir 转换模型失败,hidden_states 的相似度只有 0.035774;但是使用 docker 运行的 tpu-mlir 转换成功,hidden_states 的相似度是 0.999787。同时,pypi 和 docker 两种模式量化出来的权重是完全相同的(校验和相同):
-> % sha1sum */*.npz | sort | grep _tpu_
99043801ad023dc9b92ab2447a171e3b2d4dc2eb tmp_0/block_0_bm1684x_w8bf16_tpu_outputs.npz
99043801ad023dc9b92ab2447a171e3b2d4dc2eb tmp_1008/block_0_bm1684x_w8bf16_tpu_outputs.npz
b7343d7e2b2feee432d8e90c42f41867f7c1bb95 tmp_0/block_0_tpu_lowered_bm1684x_w8bf16_weight.npz
b7343d7e2b2feee432d8e90c42f41867f7c1bb95 tmp_1008/block_0_bm1684x_w8bf16_tpu_weights.npz
b7343d7e2b2feee432d8e90c42f41867f7c1bb95 tmp_1008/block_0_tpu_lowered_bm1684x_w8bf16_weight.npz
d0d12c76e05fb1f0a47a667a751208155f9a9354 tmp_0/block_0_tpu_addressed_bm1684x_w8bf16_weight.npz
d0d12c76e05fb1f0a47a667a751208155f9a9354 tmp_1008/block_0_tpu_addressed_bm1684x_w8bf16_weight.npz使用 pypi 安装的 tpu-mlir 转换模型失败:
[Running]: npz_tool.py compare block_0_bm1684x_w8bf16_model_outputs.npz block_0_bm1684x_w8bf16_tpu_outputs.npz --tolerance 0.99,0.90 --except - -vv
compare present_v_Reshape: 67%|███████████████████████████████████████████████████████████████▎ | 2/3 [00:00<00:00, 826.06it/s]
[out_hidden_states_Add ] NOT_SIMLIAR [FAILED]
(1, 512, 2048) float32
cosine_similarity = 0.035774
euclidean_similarity = -0.929694
sqnr_similarity = -2.923638
top-k:
idx-t target idx-r ref
641669 4.03125 562780 3.75
317828 3.671875 1728 3.71875
922863 3.546875 706872 3.671875
187331 3.40625 167446 3.625
716680 3.3125 276802 3.515625
412752 3.203125 7326 3.5
471800 3.203125 18 3.5
793336 3.203125 2519 3.5
840440 3.203125 239055 3.484375
885496 3.203125 6758 3.421875
[present_k_Add ] SIMILAR [PASSED]
(1, 512, 8, 128) float32
cosine_similarity = 0.999971
euclidean_similarity = 0.993978
sqnr_similarity = 44.406190
[present_v_Reshape ] SIMILAR [PASSED]
(1, 512, 8, 128) float32
cosine_similarity = 0.999973
euclidean_similarity = 0.994671
sqnr_similarity = 45.473313
3 compared
2 passed
0 equal, 0 close, 2 similar
1 failed
0 not equal, 1 not similar
min_similiarity = (0.035774193704128265, -0.9296936088183474, -2.9236382246017456)
Target block_0_bm1684x_w8bf16_model_outputs.npz
Reference block_0_bm1684x_w8bf16_tpu_outputs.npz
npz compare FAILED.
compare present_v_Reshape: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 23.36it/s]使用 docker 运行的 tpu-mlir 转换成功:
[Running]: npz_tool.py compare block_0_bm1684x_w8bf16_model_outputs.npz block_0_bm1684x_w8bf16_tpu_outputs.npz --tolerance 0.99,0.90 --except - -vv
compare present_v_Reshape: 67%|███████████████████████████████████████████████████████████████▎ | 2/3 [00:00<00:00, 799.68it/s]
[out_hidden_states_Add ] SIMILAR [PASSED]
(1, 512, 2048) float32
cosine_similarity = 0.999787
euclidean_similarity = 0.981876
sqnr_similarity = 34.843280
[present_k_Add ] SIMILAR [PASSED]
(1, 512, 8, 128) float32
cosine_similarity = 0.999971
euclidean_similarity = 0.993978
sqnr_similarity = 44.406190
[present_v_Reshape ] SIMILAR [PASSED]
(1, 512, 8, 128) float32
cosine_similarity = 0.999973
euclidean_similarity = 0.994671
sqnr_similarity = 45.473313
3 compared
3 passed
0 equal, 0 close, 3 similar
0 failed
0 not equal, 0 not similar
min_similiarity = (0.9997868537902832, 0.981875806096142, 34.84328031539917)
Target block_0_bm1684x_w8bf16_model_outputs.npz
Reference block_0_bm1684x_w8bf16_tpu_outputs.npz
npz compare PASSED.
compare present_v_Reshape: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 35.50it/s]
[Success]: npz_tool.py compare block_0_bm1684x_w8bf16_model_outputs.npz block_0_bm1684x_w8bf16_tpu_outputs.npz --tolerance 0.99,0.90 --except - -vvReactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels