微调代码存在异常
#1015
Replies: 2 comments
-
检查一下你的数据集是不是官方格式 |
Beta Was this translation helpful? Give feedback.
0 replies
-
解决了吗 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
python finetune_hf.py data/ikmore/ /data1/chenhao/models/THUDM/chatglm3-6b configs/lora.yaml
Setting eos_token is not supported, use the default one.
Setting pad_token is not supported, use the default one.
Setting unk_token is not supported, use the default one.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:02<00:00, 2.64it/s]
trainable params: 1,949,696 || all params: 6,245,533,696 || trainable%: 0.031217444255383614
--> Model
--> model has 1.949696M params
train_dataset: Dataset({
features: ['input_ids', 'labels'],
num_rows: 25260
})
val_dataset: Dataset({
features: ['input_ids', 'output_ids'],
num_rows: 1
})
test_dataset: Dataset({
features: ['input_ids', 'output_ids'],
num_rows: 1
})
--> Sanity check
'[gMASK]': 64790 -> -100
'sop': 64792 -> -100
'<|user|>': 64795 -> -100
'': 30910 -> -100
'\n': 13 -> -100
'你': 36474 -> -100
'是一个': 32103 -> -100
'使用': 31695 -> -100
'先进的': 38568 -> -100
'自然': 31799 -> -100
'语言': 32330 -> -100
'处理': 31940 -> -100
'技术': 31668 -> -100
'能': 54558 -> -100
'快速': 32371 -> -100
'准确': 33494 -> -100
'地': 54563 -> -100
'概括': 39268 -> -100
'文档': 42979 -> -100
'的人工': 51393 -> -100
'智能': 32093 -> -100
'助手': 42481 -> -100
',': 31123 -> -100
'善于': 35386 -> -100
'对': 54570 -> -100
'文档': 42979 -> -100
'合并': 35211 -> -100
'重复': 35287 -> -100
'内容': 31795 -> -100
'、': 31201 -> -100
'提炼': 46385 -> -100
'重点': 31881 -> -100
'内容': 31795 -> -100
',': 31123 -> -100
'具备': 32995 -> -100
'对': 54570 -> -100
'文档': 42979 -> -100
'内容': 31795 -> -100
'深度': 33442 -> -100
'理解': 32185 -> -100
'的能力': 33926 -> -100
',': 31123 -> -100
'并能': 51533 -> -100
'有效': 31942 -> -100
'识别': 34292 -> -100
'文档': 42979 -> -100
'的核心': 35732 -> -100
'内容': 31795 -> -100
',': 31123 -> -100
'排除': 35665 -> -100
'掉': 55751 -> -100
'与': 54619 -> -100
'文章': 32146 -> -100
'标题': 34490 -> -100
'不': 54535 -> -100
'相关的': 34363 -> -100
'干扰': 37272 -> -100
'。': 31155 -> -100
'文档': 42979 -> -100
'内容': 31795 -> -100
'如下': 33163 -> -100
':': 31211 -> -100
'\n': 13 -> -100
'``': 10846 -> -100
'`': 31040 -> -100
'越来越': 32193 -> -100
'贵': 55402 -> -100
'!': 31404 -> -100
'揭秘': 42487 -> -100
'SCI': 43618 -> -100
'、': 31201 -> -100
'SS': 4800 -> -100
'CI': 19427 -> -100
'期刊': 34248 -> -100
'“': 30989 -> -100
'版': 55090 -> -100
'面': 54610 -> -100
'费': 55000 -> -100
'”:': 37202 -> -100
'内': 54680 -> -100
'情': 54623 -> -100
'与': 54619 -> -100
'缴费': 36560 -> -100
'流程': 33130 -> -100
'': 30910 -> -100
'送': 55244 -> -100
'资料': 32315 -> -100
'活动': 31651 -> -100
'': 30910 -> -100
'科研': 32692 -> -100
'与': 54619 -> -100
'发表': 32560 -> -100
'\n': 13 -> -100
'科研': 32692 -> -100
'与': 54619 -> -100
'发表': 32560 -> -100
'\n': 13 -> -100
'微信号': 47286 -> -100
'\n': 13 -> -100
'ac': 338 -> -100
'adem': 2578 -> -100
'ic': 294 -> -100
'_': 30962 -> -100
'times': 6709 -> -100
'\n': 13 -> -100
'功能': 31877 -> -100
'介绍': 32025 -> -100
'\n': 13 -> -100
'关注': 31959 -> -100
'科研': 32692 -> -100
'圏': 62549 -> -100
'动态': 33543 -> -100
',': 31123 -> -100
'国际': 31721 -> -100
'期刊': 34248 -> -100
'动态': 33543 -> -100
',': 31123 -> -100
'介绍': 32025 -> -100
'发表': 32560 -> -100
'经验': 32035 -> -100
'。': 31155 -> -100
'\n': 13 -> -100
'不管是': 35759 -> -100
'文科': 43431 -> -100
'的还是': 54291 -> -100
'\n': 13 -> -100
'理工': 34327 -> -100
'科的': 53397 -> -100
',': 31123 -> -100
'现在': 31714 -> -100
'都要': 33078 -> -100
'发': 54559 -> -100
'国际': 31721 -> -100
'期刊': 34248 -> -100
'论文': 33012 -> -100
'了': 54537 -> -100
'。': 31155 -> -100
'理工': 34327 -> -100
'科': 54693 -> -100
'发': 54559 -> -100
'SCI': 43618 -> -100
',': 31123 -> -100
'文科': 43431 -> -100
'发': 54559 -> -100
'SS': 4800 -> -100
'CI': 19427 -> -100
'。': 31155 -> -100
'\n': 13 -> -100
'在': 54534 -> -100
'\n': 13 -> -100
'发': 54559 -> -100
'国际': 31721 -> -100
'论文': 33012 -> -100
'\n': 13 -> -100
'时': 54554 -> -100
',': 31123 -> -100
'我们都': 39566 -> -100
'可能会': 32934 -> -100
'被': 54732 -> -100
'学术': 32472 -> -100
'期刊': 34248 -> -100
'索': 55403 -> -100
'要': 54552 -> -100
'“': 30989 -> -100
'版': 55090 -> -100
'面': 54610 -> -100
'费': 55000 -> -100
'”,': 31644 -> -100
'就是': 31632 -> -100
'得': 54607 -> -100
'掏': 57900 -> -100
'钱': 55219 -> -100
'才能': 32017 -> -100
'发': 54559 -> -100
'论文': 33012 -> -100
'\n': 13 -> -100
'。': 31155 -> -100
'\n': 13 -> -100
'很多': 31679 -> -100
'作者': 32032 -> -100
'感慨': 37931 -> -100
',': 31123 -> -100
'现在': 31714 -> -100
'SCI': 43618 -> -100
'、': 31201 -> -100
'SS': 4800 -> -100
'CI': 19427 -> -100
'期刊': 34248 -> -100
'“': 30989 -> -100
'版': 55090 -> -100
'面': 54610 -> -100
'费': 55000 -> -100
'”,': 31644 -> -100
'越来越': 32193 -> -100
'贵': 55402 -> -100
'了': 54537 -> -100
'!': 31404 -> -100
'\n': 13 -> -100
'当然': 32276 -> -100
',': 31123 -> -100
'这': 54551 -> -100
'也不': 31946 -> -100
'绝对': 33194 -> -100
',': 31123 -> -100
'虽然': 31855 -> -100
'很多': 31679 -> -100
'国外': 34401 -> -100
'期刊': 34248 -> -100
'是': 54532 -> -100
'会': 54549 -> -100
'要': 54552 -> -100
'”': 30991 -> -100
'版': 55090 -> -100
'面': 54610 -> -100
'费': 55000 -> -100
'“': 30989 -> -100
'的': 54530 -> -100
',': 31123 -> -100
'但': 54688 -> -100
'也有很多': 48106 -> -100
'期刊': 34248 -> -100
'不要': 31844 -> -100
'这个': 31646 -> -100
'钱': 55219 -> -100
'。': 31155 -> -100
'所以': 31672 -> -100
',': 31123 -> -100
'你在': 35292 -> -100
'投稿': 38670 -> -100
'时': 54554 -> -100
',': 31123 -> -100
'一定要': 32469 -> -100
'先': 54810 -> -100
'对': 54570 -> -100
'\n': 13 -> -100
'”': 30991 -> -100
'版': 55090 -> -100
'面': 54610 -> -100
'费': 55000 -> -100
'\n': 13 -> -100
'“': 30989 -> -100
'\n': 13 -> -100
'做好': 31971 -> -100
'了解': 31788 -> -100
'。': 31155 -> -100
'\n': 13 -> -100
'我们在': 34246 -> -100
'工作中': 33837 -> -100
',': 31123 -> -100
'经常': 32289 -> -100
'有': 54536 -> -100
'作者': 32032 -> -100
'来': 54556 -> -100
'问': 54761 -> -100
'我们': 31625 -> -100
'期刊': 34248 -> -100
'\n': 13 -> -100
'”': 30991 -> -100
'\n': 13 -> -100
'版': 55090 -> -100
'面': 54610 -> -100
'费': 55000 -> -100
'\n': 13 -> -100
'“': 30989 -> -100
'的问题': 32184 -> -100
',': 31123 -> -100
'也有很多': 48106 -> -100
'关于': 31809 -> -100
'\n': 13 -> -100
'”': 30991 -> -100
'版': 55090 -> -100
'面': 54610 -> -100
'费': 55000 -> -100
'\n': 13 -> -100
'“': 30989 -> -100
'的': 54530 -> -100
'\n': 13 -> -100
'常识': 38851 -> -100
'与': 54619 -> -100
'缴费': 36560 -> -100
'流程': 33130 -> -100
'的': 54530 -> -100
'提问': 37817 -> -100
'。': 31155 -> -100
'\n': 13 -> -100
'我们': 31625 -> -100
'经过': 32082 -> -100
'整理': 33362 -> -100
',': 31123 -> -100
'这里': 31939 -> -100
'一次性': 37905 -> -100
'把': 54849 -> -100
'说明': 32448 -> -100
'送': 55244 -> -100
'给大家': 34693 -> -100
'!': 31404 -> -100
'\n': 13 -> -100
'!': 31404 -> -100
'\n': 13 -> -100
'!': 31404 -> -100
'\n': 13 -> -100
'因为': 31659 -> -100
'这些': 31704 -> -100
'说明': 32448 -> -100
'文件': 32410 -> -100
'比较多': 44036 -> -100
',': 31123 -> -100
'是': 54532 -> -100
'数字': 32224 -> -100
'版的': 38212 -> -100
',': 31123 -> -100
'需要': 31665 -> -100
'打包': 45341 -> -100
'才能': 32017 -> -100
'发': 54559 -> -100
'\n': 13 -> -100
'给大家': 34693 -> -100
',': 31123 -> -100
'所以': 31672 -> -100
'\n': 13 -> -100
'大家': 31684 -> -100
'可': 54568 -> -100
'加': 54639 -> -100
'@': 31030 -> -100
'花': 54867 -> -100
'博士': 32384 -> -100
'微信': 32109 -> -100
',': 31123 -> -100
'留言': 34376 -> -100
'\n': 13 -> -100
'“': 30989 -> -100
'我': 54546 -> -100
'需要': 31665 -> -100
'“': 30989 -> -100
'版': 55090 -> -100
'面': 54610 -> -100
'费': 55000 -> -100
'”': 30991 -> -100
'说明': 32448 -> -100
'”': 30991 -> -100
'\n': 13 -> -100
',': 31123 -> -100
'即可': 32895 -> -100
'免费': 32717 -> -100
'领取': 35773 -> -100
'。': 31155 -> -100
'留言': 34376 -> -100
'就可': 44791 -> -100
'获得': 31823 -> -100
'!': 31404 -> -100
'\n': 13 -> -100
'长按': 53846 -> -100
'、': 31201 -> -100
'扫码': 38937 -> -100
'加': 54639 -> -100
'“': 30989 -> -100
'花': 54867 -> -100
'博士': 32384 -> -100
'”': 30991 -> -100
'\n': 13 -> -100
'留言': 34376 -> -100
'“': 30989 -> -100
'\n': 13 -> -100
'我': 54546 -> -100
'需要': 31665 -> -100
'“': 30989 -> -100
'版': 55090 -> -100
'面': 54610 -> -100
'费': 55000 -> -100
'”': 30991 -> -100
'说明': 32448 -> -100
'\n': 13 -> -100
'”,': 31644 -> -100
'即可': 32895 -> -100
'免费': 32717 -> -100
'领取': 35773 -> -100
'\n': 13 -> -100
'。': 31155 -> -100
'\n': 13 -> -100
'除此之外': 36329 -> -100
',': 31123 -> -100
'在': 54534 -> -100
'科研': 32692 -> -100
'中': 54538 -> -100
',': 31123 -> -100
Traceback (most recent call last):
File "/data1/chenhao/codes/ChatGLM3/finetune_demo/finetune_hf.py", line 571, in
app()
File "/data1/chenhao/codes/ChatGLM3/finetune_demo/finetune_hf.py", line 527, in main
eval_dataset=val_dataset.select(list(range(50))),
File "/data1/chenhao/anaconda3/envs/chatglm3-ft/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 558, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/data1/chenhao/anaconda3/envs/chatglm3-ft/lib/python3.9/site-packages/datasets/fingerprint.py", line 482, in wrapper
out = func(dataset, *args, **kwargs)
File "/data1/chenhao/anaconda3/envs/chatglm3-ft/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3829, in select
return self._select_contiguous(start, length, new_fingerprint=new_fingerprint)
File "/data1/chenhao/anaconda3/envs/chatglm3-ft/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 558, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/data1/chenhao/anaconda3/envs/chatglm3-ft/lib/python3.9/site-packages/datasets/fingerprint.py", line 482, in wrapper
out = func(dataset, *args, **kwargs)
File "/data1/chenhao/anaconda3/envs/chatglm3-ft/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3879, in _select_contiguous
_check_valid_indices_value(start + length - 1, len(self))
File "/data1/chenhao/anaconda3/envs/chatglm3-ft/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 650, in _check_valid_indices_value
raise IndexError(f"Index {index} out of range for dataset of size {size}.")
IndexError: Index 49 out of range for dataset of size 1.
我的数据集格式
Beta Was this translation helpful? Give feedback.
All reactions