-
Notifications
You must be signed in to change notification settings - Fork 49
Description
Hi all,
I'm trying to use MARS-AdamW with DeepSpeed (Stage 1) for a fine-tuning problem and receive the following error. The optimizer works on the same problem in a distributed setting as long as DeepSpeed is disabled. In my case both no_decay and decay groups are non-empty. I was wondering if you know of guidelines to make MARS work with DeepSpeed.
...
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2245, in train
[rank0]: return inner_training_loop(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2374, in _inner_training_loop
[rank0]: model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1440, in prepare
[rank0]: result = self._prepare_deepspeed(*args)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2033, in _prepare_deepspeed
[rank0]: engine, optimizer, _, lr_scheduler = ds_initialize(**kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 193, in initialize
[rank0]: engine = DeepSpeedEngine(args=args,
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 326, in init
[rank0]: self._configure_optimizer(optimizer, model_parameters)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1408, in _configure_optimizer
[rank0]: self.optimizer = self._configure_zero_optimizer(basic_optimizer)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1666, in _configure_zero_optimizer
[rank0]: optimizer = DeepSpeedZeroOptimizer(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 355, in init
[rank0]: flattened_buffer = self.flatten_dense_tensors_aligned(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 942, in flatten_dense_tensors_aligned
[rank0]: return self.flatten(align_dense_tensors(tensor_list, alignment))
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 510, in _flatten_dense_tensors
[rank0]: return torch._C._nn.flatten_dense_tensors(tensors)
[rank0]: RuntimeError: torch.cat(): expected a non-empty list of Tensors