Skip to content

DeepSpeed Integration #5

@EhsanEI

Description

@EhsanEI

Hi all,

I'm trying to use MARS-AdamW with DeepSpeed (Stage 1) for a fine-tuning problem and receive the following error. The optimizer works on the same problem in a distributed setting as long as DeepSpeed is disabled. In my case both no_decay and decay groups are non-empty. I was wondering if you know of guidelines to make MARS work with DeepSpeed.

...
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2245, in train
[rank0]: return inner_training_loop(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2374, in _inner_training_loop
[rank0]: model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1440, in prepare
[rank0]: result = self._prepare_deepspeed(*args)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2033, in _prepare_deepspeed
[rank0]: engine, optimizer, _, lr_scheduler = ds_initialize(**kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/init.py", line 193, in initialize
[rank0]: engine = DeepSpeedEngine(args=args,
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 326, in init
[rank0]: self._configure_optimizer(optimizer, model_parameters)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1408, in _configure_optimizer
[rank0]: self.optimizer = self._configure_zero_optimizer(basic_optimizer)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1666, in _configure_zero_optimizer
[rank0]: optimizer = DeepSpeedZeroOptimizer(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 355, in init
[rank0]: flattened_buffer = self.flatten_dense_tensors_aligned(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 942, in flatten_dense_tensors_aligned
[rank0]: return self.flatten(align_dense_tensors(tensor_list, alignment))
[rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 510, in _flatten_dense_tensors
[rank0]: return torch._C._nn.flatten_dense_tensors(tensors)
[rank0]: RuntimeError: torch.cat(): expected a non-empty list of Tensors

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions