AttributeError: 'ScalingTensor' object has no attribute 'view'

**What's the issue, what's expected?**:
Error when using ms-amp to do llm sft.
ms-amp deepspeed config:
"msamp": {
  "enabled": true,
  "opt_level": "O1|O2|O3", # all tried
  "use_te": false
}

**How to reproduce it?**:
Follow the setup of DeepSpeed-Chat, and do some small code modify to enable ms-amp in DeepSpeed-Chat/training/step1_supervised_finetuning/main.py: 

line 20 modify:  import deepspeed -> from msamp import deepspeed

line 230 add:
   ds_config["msamp"] = {
    "enabled": True,
    "opt_level": "O1|O2|O3",
    "use_te": False
}


**Log message or shapshot?**:
```python
Traceback (most recent call last):
  File "/home/work/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/main.py", line 400, in <module>
    main()
  File "/home/work/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/main.py", line 369, in main
    model.backward(loss)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/msamp/deepspeed/runtime/engine.py", line 405, in backward
    self.optimizer.backward(loss, retain_graph=retain_graph)
  File "/usr/local/lib/python3.10/dist-packages/msamp/deepspeed/runtime/zero/fp8_stage_1_and_2.py", line 951, in backward
    super().backward(loss.float(), retain_graph=retain_graph)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2040, in backward
    self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
    scaled_loss.backward(retain_graph=retain_graph)
  File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 491, in backward
    torch.autograd.backward(
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 288, in apply
    return user_fn(self, *args)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 288, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 288, in apply
    return user_fn(self, *args)
  File "/usr/local/lib/python3.10/dist-packages/msamp/nn/functional.py", line 123, in backward
    ctx.weight.backward_grad_update(wgrad)
  File "/usr/local/lib/python3.10/dist-packages/msamp/common/tensor/tensor.py", line 130, in backward_grad_update
    self._backward_post_hooks(grad)
  File "/usr/local/lib/python3.10/dist-packages/msamp/common/tensor/hook.py", line 47, in __call__
    hook(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1581, in _call_impl
    hook_result = hook(self, args, result)
  File "/usr/local/lib/python3.10/dist-packages/msamp/deepspeed/runtime/zero/fp8_stage_1_and_2.py", line 386, in reduce_partition_and_remove_grads
    self.fp8_reduce_ready_partitions_and_remove_grads(param, i)
  File "/usr/local/lib/python3.10/dist-packages/msamp/deepspeed/runtime/zero/fp8_stage_1_and_2.py", line 595, in fp8_reduce_ready_partitions_and_remove_grads
    self.fp8_reduce_independent_p_g_buckets_and_remove_grads(param, i)
  File "/usr/local/lib/python3.10/dist-packages/msamp/deepspeed/runtime/zero/fp8_stage_1_and_2.py", line 412, in fp8_reduce_independent_p_g_buckets_and_remove_grads
    self.fp8_reduce_ipg_grads()
  File "/usr/local/lib/python3.10/dist-packages/msamp/deepspeed/runtime/zero/fp8_stage_1_and_2.py", line 541, in fp8_reduce_ipg_grads
    self.fp8_average_tensor(self.fp8_extra_large_param_to_reduce.grad.view(-1))
AttributeError: 'ScalingTensor' object has no attribute 'view'
```

**Additional information**:
env: ghcr.io/azure/msamp:v0.4.0-cuda12.2
gpu: h100 * 8


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: 'ScalingTensor' object has no attribute 'view' #180

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AttributeError: 'ScalingTensor' object has no attribute 'view' #180

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions