stream_chat 中 input_ids_seq_length 快速增长，输入超过一定数量后报CUDA越界错误 #594

10cent01 · 2023-12-14T03:05:55Z

10cent01
Dec 14, 2023

你好，我通过一个外部循环给glm3调用stream_chat 传入我想要问的一系列问题：

max_len = 10
for msg in data:
    print(f'Patients: {len(ans)+1}/{len(data)}', flush=True, end=' ')
    current_length = 0
    past_k_v = None

    for response, _ in model.stream_chat(tokenizer, msg['input'], history=history, # max_length=40000, # max_new_tokens=80000, 
                                                                top_p=top_value, temperature=tem_value, 
                                                                # past_key_values=past_k_v, return_past_key_values=True
                                                                ):
        if (current_length > max_len):
            print(' -- inf loop', end=' ')
            break
        current_length += 1
    print(response)

然后发现到了大约114条输入时，报出提示：Patients: 114/245 Input length of input_ids is 32802, but `max_length` is set to 32768. This can lead to unexpected behavior. You should consider increasing `max_new_tokens`. ，随后报出一系列cuda越界错误（后面省略）：

../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4100,0,0], thread: [32,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4100,0,0], thread: [33,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4100,0,0], thread: [34,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4100,0,0], thread: [35,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4100,0,0], thread: [36,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4100,0,0], thread: [37,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4100,0,0], thread: [38,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. ....

我尝试按照给的提示修改max_length 和 max_new_tokens，问题依然存在。然后我在modelling_chatglm.py的stream_generate()函数中打印： batch_size, input_ids_seq_length = input_ids.shape[0], input_ids.shape[-1]，结果大致为：

input_ids_seq_length ，即input_ids.shape[-1] 的值从360增长到3万多。而我记起之前使用GLM2的时候没有遇到过这种问题，我也顺便添加了一下打印，发现这个值一直在400-600之间上下徘徊，而不像GLM3这里稳定增长。

另外，我是在对GLM3全量微调后的模型上发现的这个问题，然后在GLM3原始模型上测试，也有一模一样的问题。不确定这跟之前CUDA error的反馈是否有联系？

Originally posted by @10cent01 in #393 (comment)

10cent01 · 2023-12-15T02:34:17Z

10cent01
Dec 15, 2023
Author

nvm，input_ids_seq_length 的问题，是 stream_chat 里 past_key_values 参数导致的，似乎跟CUDA报错没有联系。

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

stream_chat 中 input_ids_seq_length 快速增长，输入超过一定数量后报CUDA越界错误 #594

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

stream_chat 中 input_ids_seq_length 快速增长，输入超过一定数量后报CUDA越界错误 #594

Uh oh!

10cent01 Dec 14, 2023

Replies: 1 comment

Uh oh!

10cent01 Dec 15, 2023 Author

10cent01
Dec 14, 2023

10cent01
Dec 15, 2023
Author