Skip to content

AttributeError: 'ScalingTensor' object has no attribute 'view' #180

@LSC527

Description

@LSC527

What's the issue, what's expected?:
Error when using ms-amp to do llm sft.
ms-amp deepspeed config:
"msamp": {
"enabled": true,
"opt_level": "O1|O2|O3", # all tried
"use_te": false
}

How to reproduce it?:
Follow the setup of DeepSpeed-Chat, and do some small code modify to enable ms-amp in DeepSpeed-Chat/training/step1_supervised_finetuning/main.py:

line 20 modify: import deepspeed -> from msamp import deepspeed

line 230 add:
ds_config["msamp"] = {
"enabled": True,
"opt_level": "O1|O2|O3",
"use_te": False
}

Log message or shapshot?:

Traceback (most recent call last):
  File "/home/work/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/main.py", line 400, in <module>
    main()
  File "/home/work/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/main.py", line 369, in main
    model.backward(loss)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/msamp/deepspeed/runtime/engine.py", line 405, in backward
    self.optimizer.backward(loss, retain_graph=retain_graph)
  File "/usr/local/lib/python3.10/dist-packages/msamp/deepspeed/runtime/zero/fp8_stage_1_and_2.py", line 951, in backward
    super().backward(loss.float(), retain_graph=retain_graph)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2040, in backward
    self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
    scaled_loss.backward(retain_graph=retain_graph)
  File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 491, in backward
    torch.autograd.backward(
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 288, in apply
    return user_fn(self, *args)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 288, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 288, in apply
    return user_fn(self, *args)
  File "/usr/local/lib/python3.10/dist-packages/msamp/nn/functional.py", line 123, in backward
    ctx.weight.backward_grad_update(wgrad)
  File "/usr/local/lib/python3.10/dist-packages/msamp/common/tensor/tensor.py", line 130, in backward_grad_update
    self._backward_post_hooks(grad)
  File "/usr/local/lib/python3.10/dist-packages/msamp/common/tensor/hook.py", line 47, in __call__
    hook(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1581, in _call_impl
    hook_result = hook(self, args, result)
  File "/usr/local/lib/python3.10/dist-packages/msamp/deepspeed/runtime/zero/fp8_stage_1_and_2.py", line 386, in reduce_partition_and_remove_grads
    self.fp8_reduce_ready_partitions_and_remove_grads(param, i)
  File "/usr/local/lib/python3.10/dist-packages/msamp/deepspeed/runtime/zero/fp8_stage_1_and_2.py", line 595, in fp8_reduce_ready_partitions_and_remove_grads
    self.fp8_reduce_independent_p_g_buckets_and_remove_grads(param, i)
  File "/usr/local/lib/python3.10/dist-packages/msamp/deepspeed/runtime/zero/fp8_stage_1_and_2.py", line 412, in fp8_reduce_independent_p_g_buckets_and_remove_grads
    self.fp8_reduce_ipg_grads()
  File "/usr/local/lib/python3.10/dist-packages/msamp/deepspeed/runtime/zero/fp8_stage_1_and_2.py", line 541, in fp8_reduce_ipg_grads
    self.fp8_average_tensor(self.fp8_extra_large_param_to_reduce.grad.view(-1))
AttributeError: 'ScalingTensor' object has no attribute 'view'

Additional information:
env: ghcr.io/azure/msamp:v0.4.0-cuda12.2
gpu: h100 * 8

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions