Replies: 1 comment
-
nvm,input_ids_seq_length 的问题,是 stream_chat 里 past_key_values 参数导致的,似乎跟CUDA报错没有联系。 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
你好,我通过一个外部循环给glm3调用stream_chat 传入我想要问的一系列问题:
然后发现到了大约114条输入时,报出提示:
Patients: 114/245 Input length of input_ids is 32802, but `max_length` is set to 32768. This can lead to unexpected behavior. You should consider increasing `max_new_tokens`.
,随后报出一系列cuda越界错误(后面省略):../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4100,0,0], thread: [32,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4100,0,0], thread: [33,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4100,0,0], thread: [34,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4100,0,0], thread: [35,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4100,0,0], thread: [36,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4100,0,0], thread: [37,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4100,0,0], thread: [38,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. ....
我尝试按照给的提示修改
max_length
和max_new_tokens
,问题依然存在。然后我在modelling_chatglm.py的stream_generate()函数中打印:batch_size, input_ids_seq_length = input_ids.shape[0], input_ids.shape[-1]
, 结果大致为:input_ids_seq_length
,即input_ids.shape[-1]
的值从360增长到3万多。而我记起之前使用GLM2的时候没有遇到过这种问题,我也顺便添加了一下打印,发现这个值一直在400-600之间上下徘徊,而不像GLM3这里稳定增长。另外,我是在对GLM3全量微调后的模型上发现的这个问题,然后在GLM3原始模型上测试,也有一模一样的问题。不确定这跟之前CUDA error的反馈是否有联系?
Originally posted by @10cent01 in #393 (comment)
Beta Was this translation helpful? Give feedback.
All reactions