Skip to content

Conversation

@sfc-gh-mwyatt
Copy link
Collaborator

@sfc-gh-mwyatt sfc-gh-mwyatt commented Dec 18, 2025

Adding support for FP8 autocasting feature from Transformer Engine library.

  • Replace nn.Linear layers with transformer engine linear layer in loaded models for layers that substring match fp8_target_modules
  • wrap loss calculation with autocast
  • Add config support for fp8_recipe:
    • requires specifying a type that maps to Recipe subclass
    • Requires specifying fp8_target_modules

Example Usage:

model:
  name_or_path: Qwen/Qwen3-8B
  fp8_recipe:
    type: delayedscaling
    fp8_format: hybrid
    amax_history_len: 16
    amax_compute_algo: max
  fp8_target_modules:
    - q_proj
    - k_proj
    - v_proj
    - up_proj
    - down_proj
    - gate_proj

data:
  sources:
    - type: huggingface_instruct
      name_or_path: HuggingFaceH4/ultrachat_200k:train[:1000]

hf_config_kwargs: Dict = Field(default_factory=dict)
""" Optional kwargs to override in the HF model config object created by `AutoConfig.from_pretrained(model.name_or_path)` """

fp8_recipe: Optional[Any] = None
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should be named differently. TE supports both FP8 and FP4 - plus there are other libraries we may wish to support. A better name may be:

  • quant_recipe
  • quant_target_modules

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants