SIGBUS error while using multiprocessor training. #1595
Unanswered
shahzaibbaig123
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am using mistral model and finetuning it. My server has 4 gpus. My training works fine when I use one or two gpus. But it gives me error when I try to increase the number of process in my training command. The command I am using is this:

accelerate launch --num_processes=3 -m axolotl.cli.train /data/axo_configs/mistral/extract/config.yml --deepspeed deepspeed_configs/zero1.json
Also, this is the command which i use for getting the image and make a podman container:
podman run -v /data:/data --device nvidia.com/gpu=all --security-opt=label=disable --rm -it axolotl
This is the error:
Beta Was this translation helpful? Give feedback.
All reactions