Deepspeed Startegy doesn't set num_checkpoints while using activation partitions

### Bug description

When training with DeepSpeed and configuring the ZeRO Stage 3 strategy, if activation partitioning is enabled along with contiguous_checkpointing, you may encounter an "index out of range" error related to contiguous_data_buffers. This issue arises because, during the creation of the activation partition configuration, the num_checkpoints parameter is not passed. As a result, DeepSpeed uses the global variable num_layers with its default value of False, which leads to the incorrect creation of an empty contiguous_data_buffers.

### What version are you seeing the problem on?

v2.4

### How to reproduce the bug

_No response_

### Error messages and logs

```
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 574, in apply
[rank0]:     return super().apply(*args, **kwargs)  # type: ignore[misc]
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 557, in forward
[rank0]:     inputs = partition_activations(args, CPU_CHECKPOINT, CONTIGUOUS_CHECKPOINTING)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 421, in partition_activations
[rank0]:     contiguous_data_buffers[i][data_offsets[i]].data[range(
[rank0]: IndexError: list index out of range
```


### Environment

<details>
  <summary>Current environment</summary>

```
#- PyTorch Lightning Version: 2.4.0
#- PyTorch Version: 2.4.1
#- Python version: 3.10.6
#- OS: Ubuntu-22.04
#- CUDA version: 12.1
#- GPU models and configuration: A100
#- How you installed Lightning(`conda`, `pip`, source): pip insatll
```

</details>


### More info

_No response_

cc @lantiga

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deepspeed Startegy doesn't set num_checkpoints while using activation partitions #20329

Bug description

What version are you seeing the problem on?

How to reproduce the bug

Error messages and logs

Environment

More info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Deepspeed Startegy doesn't set num_checkpoints while using activation partitions #20329

Description

Bug description

What version are you seeing the problem on?

How to reproduce the bug

Error messages and logs

Environment

More info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions