-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Description
This outlines the current status of gpt-oss features that need to be implemented in Megatron Core, leveraging Transformer Engine.
✅ UPDATE: All core GPT-OSS functionality is now available in Megatron Core (training) and Megatron Bridge (checkpoint conversion).
MoE Layer
Enabled Bias
- Status: ✅ Supported
- Implementation: Available in main branch: Fixes for gpt-oss #2038
Attention Mechanisms
Alternating Sliding-Window Attention Pattern
- Status: ✅ Supported - Infrastructure exists for per-layer patterns and sliding window attention using TE
Attention Sinks
- Status: ✅ Implemented - in Transformer Engine and cuDNN
- Reference: Streaming LLM
- Related Transformer Engine PR: [PyTorch] Add sink attention support from cuDNN TransformerEngine#2148
Activation Functions
Custom SwiGLU with Clamping
- Status: ✅ Supported
- Implementation: Megatron Core added partially fused version as "custom quick GeGLU"
FP8-aware fused kernel merged into Transformer Engine
- Related Transformer Engine PR: [Pytorch] Support for Swiglu Activation used in GPT OSS TransformerEngine#2161
Positional Encodings
YaRN RoPE Scaling
- Status: ✅ Fully Supported
- Implementation:
- YaRN scaling to 128k+ context
- Integration with existing RoPE
- YaRN for general RoPE/GPT models
- Convergence validation
- Usage:
--position-embedding-type yarnwith YaRN configuration parameters - Reference: arXiv:2309.00071
Megatron Bridge Support
Megatron Bridge provides full GPT-OSS integration:
- ✅ Checkpoint Conversion: Hugging Face ↔ Megatron format
- ✅ Pre-configured Providers:
GPTOSSProvider20BandGPTOSSProvider120B - ✅ Quantization Support: Handles MXFP4 weight dequantization
Megatron Bridge + Megatron-LM Example
PR: #2383 provides end-to-end example scripts covering checkpoint conversion (convert_mcore_bf16_checkpoint_from_hf.py) and training/fine-tuning (training_gptoss_20b_h100_bf16_fp8.sh)
Credits: @cuichenx for core implementation, @yiakwy-xpu-ml-framework-team for example scripts
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request