Skip to content

Commit e216358

Browse files
committed
chore: remove unrelated changes from distributed_data_parallel_config.py
Restore file to match upstream main exactly, removing: - Unused typing imports (Dict, List, Tuple) - fsdp_db_use_persist_buf_on_alloc_fail field - fsdp_manual_registration field - Docstring indentation changes These were unrelated to NTP and came from other branches.
1 parent 51d98b2 commit e216358

File tree

1 file changed

+2
-19
lines changed

1 file changed

+2
-19
lines changed

megatron/core/distributed/distributed_data_parallel_config.py

Lines changed: 2 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
22

33
from dataclasses import dataclass
4-
from typing import Dict, List, Optional, Tuple
4+
from typing import Optional
55

66

77
@dataclass
@@ -122,16 +122,7 @@ class DistributedDataParallelConfig:
122122
This option will cause additional memory overhead, however, it is necessary for
123123
to register user buffer (nccl_ub=True) for the Megatron FSDP.
124124
This option will be automatically set to True when nccl_ub=True.
125-
"""
126-
127-
fsdp_db_use_persist_buf_on_alloc_fail: bool = False
128-
"""Whether to fall back to persistent buffer when a bucket does not
129-
fit FSDP double buffer size. If true, FSDP will use the persistently
130-
allocated buffer for the bucket that does not fit, it will enable NCCL
131-
user buffer with the cost of more memory usage. If false, FSDP will use
132-
Dynamic memory allocator, NCCL user buffer won't not enabled, which
133-
usually leads to low performance.
134-
"""
125+
"""
135126

136127
outer_dp_sharding_strategy: str = 'no_shard'
137128
"""
@@ -146,14 +137,6 @@ class DistributedDataParallelConfig:
146137
when nccl_ub is set.
147138
"""
148139

149-
fsdp_manual_registration: bool = False
150-
"""If true, manually register the FSDP communication buffers to NCCL user buffer.
151-
This option is only effective when use_megatron_fsdp and nccl_ub is set.
152-
For symmetric registration with large models, the registration itself can take
153-
a significant amount of time. This option minimizes the number of registration calls
154-
to minimize the registration time.
155-
"""
156-
157140
delay_wgrad_compute: bool = False
158141
"""Delay the weight gradient computation to improve batch-level communication overlapping"""
159142

0 commit comments

Comments
 (0)