-
Notifications
You must be signed in to change notification settings - Fork 233
Description
my command is below:
python -u train.py model=pythia69 datasets=[hh] loss=sft exp_name=anthropic_dpo_pythia69 gradient_accumulation_steps=2 batch_size=64 eval_batch_size=32 trainer=FSDPTrainer sample_during_eval=false
and I met an error, the log is like that:
building policy
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00, 3.28s/it]
starting 8 processes for FSDP training
setting RLIMIT_NOFILE soft limit to 1048576 from 1048576
/search/odin/user_messizjwang/envs/dpo/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 2 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Bus error
So can anyone help me to solve this? I didn't make any change to this repo, I just clone it and run the command. If you need more info, please contact me, I really need your help, and thanks for giving me a hand.