Skip to content

Commit 2bef100

Browse files
dfalbelclaude
andauthored
Fix gptoss_from_pretrained to correctly load HuggingFace weights (#4)
* Fix gptoss_from_pretrained to correctly load HuggingFace weights - Update gptoss_normalize_config to map HF config keys (num_local_experts, num_experts_per_tok, nested rope_scaling) to internal names - Rewrite gptoss_hf_weights_remap to: - Use underscore suffix (_blocks/_scales) for MXFP4 weight detection - Remap HF parameter names to model parameter names - Concatenate separate q/k/v projections into combined qkv tensors Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add dotty to Imports Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * ++ --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 655d881 commit 2bef100

File tree

8 files changed

+775
-13
lines changed

8 files changed

+775
-13
lines changed

DESCRIPTION

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,15 @@ Description: A collection of minimal implementations of deep learning
1111
License: MIT + file LICENSE
1212
Encoding: UTF-8
1313
Roxygen: list(markdown = TRUE)
14-
RoxygenNote: 7.3.2
14+
RoxygenNote: 7.3.3
1515
Suggests:
1616
testthat (>= 3.0.0)
1717
Depends:
1818
R (>= 4.1.0)
1919
Config/testthat/edition: 3
20-
Imports:
20+
Imports:
2121
cli,
22+
dotty,
2223
hfhub,
2324
purrr,
2425
rlang,

NAMESPACE

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,9 @@ export(gptbigcode_from_pretrained)
1212
export(gptneox)
1313
export(gptneox_from_config)
1414
export(gptneox_from_pretrained)
15+
export(gptoss)
16+
export(gptoss_from_config)
17+
export(gptoss_from_pretrained)
1518
export(hf_state_dict)
1619
export(llama)
1720
export(llama_from_config)

0 commit comments

Comments
 (0)