Skip to content

Commit e72a4fb

Browse files
authored
[feat](kt-kernel): Add resume arg to CPU weight conversion (#1630)
* [feat]: kt-kernel: Add resume arg to CPU weight conversion * [docs]: kt-kernel: Document resume arg for CPU weight conversion * [fix]: kt-kernel: Only print resume layer if in use * [fix]: kt-kernel: Don't log skipped layers when using resume_layer
1 parent e69c677 commit e72a4fb

File tree

2 files changed

+32
-3
lines changed

2 files changed

+32
-3
lines changed

kt-kernel/scripts/README.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,20 @@ output_dir/
107107
- Need to process very large models on memory-constrained systems
108108
- Want to preserve intermediate layer-wise quantized weights
109109

110+
### Resume Layer
111+
112+
For memory-constrained systems that are unable to complete quantization despite enabling low memory mode with `--no-merge-safetensor`, restart the script with the `--resume-layer` arg to specify the layer from which to continue the conversion process. In the example below, we skip layers 0-11 and resume conversion starting with layer 12.
113+
114+
```bash
115+
python scripts/convert_cpu_weights.py \
116+
--input-path /path/to/model \
117+
--input-type bf16 \
118+
--output /path/to/output \
119+
--quant-method int4 \
120+
--no-merge-safetensor
121+
--resume-layer 12
122+
```
123+
110124
## Examples
111125

112126
### Example 1: Quantize DeepSeek-V3.1 (FP8 → INT4)

kt-kernel/scripts/convert_cpu_weights.py

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -330,11 +330,18 @@ def _convert_layer_experts(self, layer_idx: int, expert_ids: List[int]) -> Dict[
330330
"""
331331
raise NotImplementedError("Subclasses must implement _convert_layer_experts")
332332

333-
def convert(self):
334-
"""Convert all expert layers using subclass-specific logic."""
333+
def convert(self, resume_layer: int = 0):
334+
"""Convert all expert layers using subclass-specific logic.
335+
336+
Args:
337+
resume_layer (int, optional): The layer index to resume conversion from.
338+
Layers with an index lower than this will be skipped. Defaults to 0.
339+
"""
335340
print("Starting conversion...")
336341
print(f"Input: {self.input_path}")
337342
print(f"Output: {self.output_path}")
343+
if resume_layer > 0:
344+
print(f"Resuming from layer: {resume_layer}")
338345

339346
# Create output directory
340347
os.makedirs(self.output_path, exist_ok=True)
@@ -355,6 +362,8 @@ def convert(self):
355362

356363
# Process layers with memory cleanup
357364
for i, (layer_idx, expert_ids) in enumerate(sorted(expert_layers.items())):
365+
if layer_idx < resume_layer:
366+
continue
358367
print(f"Processing layer {layer_idx} ({i+1}/{len(expert_layers)})...")
359368

360369
layer_tensors = self._convert_layer_experts(layer_idx, expert_ids)
@@ -840,6 +849,12 @@ def main():
840849
default=False,
841850
help="Keep layer folders without merging to safetensor files (default: False)",
842851
)
852+
parser.add_argument(
853+
"--resume-layer",
854+
type=int,
855+
default=0,
856+
help="Resume conversion starting at this layer index (default: 0)",
857+
)
843858

844859
args = parser.parse_args()
845860

@@ -893,7 +908,7 @@ def main():
893908
)
894909

895910
# Run conversion
896-
converter.convert()
911+
converter.convert(resume_layer=args.resume_layer)
897912

898913
# Cleanup
899914
converter.close()

0 commit comments

Comments
 (0)