Skip to content

Conversation

@jason9693
Copy link

@jason9693 jason9693 commented Oct 20, 2025

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

this PR is related to #5204
image

  • Enable to export HF compatible LoRA Adapter
  • MoE Fused weight not supported ( vLLM's recommend set (issue))
  • only for gpt (megatron model) now.
  • test code (WIP)

How to Test

  1. train with lora script
# 2 * 62GiB, 5.10s/it
PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' \
NPROC_PER_NODE=2 \
CUDA_VISIBLE_DEVICES=0,1 \
megatron sft \
    --load qwen/Qwen3-30B-A3B-mcore \
    --model_type qwen3_moe \
    --dataset  'Qwen3_dataset_0801/data.jsonl' \
    --columns '{"conversations": "messages", "tools": "tools"}' \
    --agent_template hermes \
    --loss_scale ignore_empty_think \
    --train_type lora \
    --lora_rank 8 \
    --lora_alpha 32 \
    --target_modules linear_qkv linear_proj \
    --split_dataset_ratio 0.01 \
...
  1. convert lora without merge_lora option.
model_path=${1}

CUDA_VISIBLE_DEVICES=0 \
swift export \
    --mcore_adapters ${model_path} \
    --to_hf true \
    --torch_dtype bfloat16 \
    --output_dir ${model_path}_peft\
    --merge_lora false  ### IMPORTANT ###
  1. test conversion
python tests/megatron/test_lora_export.py
  1. load on vllm
vllm serve ${BACKBONE_MODEL_PATH} --enable-lora --lora-modules test1=${MODEL_PATH} test2=${MODEL_PATH} \
--enable-auto-tool-choice --tool-call-parser hermes --max_lora_rank 64 --max_loras 2 --host 0.0.0.0 --port 8000  > ${SERVE_LOG_PATH} 2>&1 &

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @jason9693, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement by allowing users to export unmerged LoRA adapters from Megatron Core models into a format compatible with HuggingFace's PEFT library. This feature streamlines the process of using LoRA fine-tuned models across different frameworks, providing flexibility and reducing the need for full model retraining. The changes involve adding a dedicated conversion module and integrating it into the existing model export pipeline, specifically targeting GPT models.

Highlights

  • HuggingFace LoRA Adapter Export: This pull request enables the export of unmerged LoRA adapters from Megatron Core models to a HuggingFace PEFT compatible format, allowing for greater interoperability.
  • New Conversion Module: A new module, mcore2hf_lora.py, has been introduced to handle the complex logic of converting various Megatron Core LoRA types (e.g., fused QKV, MLP projections, MLA attention) into their HuggingFace PEFT equivalents.
  • Conditional LoRA Handling: The conversion utility now includes logic to either merge LoRA weights into the base model or export them as a separate adapter, based on command-line arguments (--to-hf and --merge-lora).
  • GPT Model Specific: The LoRA adapter conversion functionality is currently implemented and supported exclusively for GPT (Megatron model) architectures.
  • Module Key Remapping: The conversion process includes remapping of module keys (e.g., decoder to model, self_attention to self_attn) to align with HuggingFace's naming conventions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a feature to export unmerged LoRA adapters from Megatron Core to HuggingFace PEFT format. The implementation is well-structured, with the core logic encapsulated in a new file. However, I've identified a critical issue in the conversion script that could lead to exported adapter files being overwritten. Additionally, there are a couple of high-severity bugs in the LoRA weight conversion logic that would prevent exporting adapters with biases and would handle other parameters inconsistently. I've provided detailed comments and suggestions to address these issues.

@Jintao-Huang
Copy link
Collaborator

Hello! Thanks for your PR. Could you please change the comments to English? 😊

@jason9693
Copy link
Author

jason9693 commented Oct 21, 2025

Hello! Thanks for your PR. Could you please change the comments to English? 😊

Oh, I forgot to change the comments.
I'll change this ASAP!

@jason9693
Copy link
Author

jason9693 commented Oct 21, 2025

@Jintao-Huang I just checked all of my known bugs remained and pushed it with conversion test code.
you can run test on this file.

+FYI) I was tested on Qwen3 A3B MoE model.

@Jintao-Huang
Copy link
Collaborator

Hello! I will review it as soon as possible today or tomorrow. Please wait. 😊

@Jintao-Huang
Copy link
Collaborator

Could you use pre-commit to format the code to pass the lint tests?

(use py310/py311)

pip install pre-commit
pre-commit run --all-files

@jason9693
Copy link
Author

@Jintao-Huang I just checked pre-commit and commit that changes.

@jason9693
Copy link
Author

@Jintao-Huang Is there any actions I need to follow?

@jason9693 jason9693 changed the title Enable for exporting unmerged HF Lora Adapter feat: Enable for exporting unmerged HF Lora Adapter Oct 24, 2025
@jason9693
Copy link
Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR introduces a valuable feature to export LoRA adapters in the HuggingFace PEFT format without merging them, which is useful for serving with frameworks like vLLM. The implementation is well-structured, with the core conversion logic neatly encapsulated in mcore2hf_lora.py. The changes to the export script in convert.py are clean and integrate the new functionality well.

However, I've identified a few issues. The LoRA conversion logic appears to miss handling biases, which could lead to incorrect conversions for LoRA layers that use them. Additionally, the new test script test_lora_export.py has some critical flaws: it uses hardcoded paths, making it non-executable, and the weight comparison logic is incorrect as it operates on an already merged model. The test coverage for converted modules is also incomplete. Addressing these points will ensure the feature is robust and verifiable.

Comment on lines +73 to +82
def push(dst, key, tensor):
"""Create independent copy of tensor for saving + ensure contiguous memory"""
t = tensor.detach().clone().contiguous()
if key in dst:
raise ValueError(f'Duplicate key: {key}')
if 'weight' not in key:
logger.debug(f'Skipping non-weight key: {key}')
return
key = remap_key_for_peft(key)
dst[key] = t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The push function currently skips any tensor whose key does not contain 'weight'. This will cause LoRA biases to be dropped during conversion if they are enabled (e.g., with lora_bias='all'). This could lead to incorrect behavior for the exported adapter. The check should be removed to ensure all parts of the LoRA layers, including biases, are correctly processed and saved.

    def push(dst, key, tensor):
        """Create independent copy of tensor for saving + ensure contiguous memory"""
        t = tensor.detach().clone().contiguous()
        if key in dst:
            raise ValueError(f'Duplicate key: {key}')
        key = remap_key_for_peft(key)
        dst[key] = t

base = AutoModelForCausalLM.from_pretrained(BASE_MODEL, torch_dtype=DTYPE, device_map={'': GPU_PEFT}).eval()

peft_model = PeftModel.from_pretrained(base, ADAPTER_DIR).eval()
model = peft_model.merge_and_unload().eval()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The load_models function loads a PeftModel and then immediately merges it using peft_model.merge_and_unload(). The resulting merged model is then used in compare_weights. However, compare_weights and its helper peft_effective_weight are designed to work with an unmerged PeftModel to calculate the effective weight by combining the base weight and the LoRA delta (W0 + B@A*scale). When passed a merged model, peft_has_lora will be false, and the test will simply compare the merged weight against itself, making the check ineffective. The test should compare a pre-merged model against an unmerged PeftModel. To fix this, the model returned should be the unmerged PeftModel.

Suggested change
model = peft_model.merge_and_unload().eval()
model = peft_model

Comment on lines +103 to +108
def find_linear_modules(model: nn.Module, suffixes=('q_proj', 'k_proj', 'v_proj', 'o_proj')) -> Dict[str, nn.Linear]:
out = {}
for name, mod in model.named_modules():
if isinstance(mod, nn.Linear) and any(name.endswith(f'.self_attn.{suf}') for suf in suffixes):
out[name] = mod
return out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The find_linear_modules function is limited to finding LoRA modules within self-attention blocks (e.g., q_proj, o_proj). However, the conversion logic also supports LoRA on MLP layers (gate_proj, up_proj, down_proj) and MoE experts. The weight comparison test should be extended to cover these modules as well to ensure complete verification.

Suggested change
def find_linear_modules(model: nn.Module, suffixes=('q_proj', 'k_proj', 'v_proj', 'o_proj')) -> Dict[str, nn.Linear]:
out = {}
for name, mod in model.named_modules():
if isinstance(mod, nn.Linear) and any(name.endswith(f'.self_attn.{suf}') for suf in suffixes):
out[name] = mod
return out
def find_linear_modules(model: nn.Module, suffixes=(
'q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj'
)) -> Dict[str, nn.Linear]:
out = {}
for name, mod in model.named_modules():
if isinstance(mod, nn.Linear) and any(name.endswith(suf) for suf in suffixes):
out[name] = mod
return out

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants