feat: Enable for exporting unmerged HF Lora Adapter #6225

jason9693 · 2025-10-20T11:08:23Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

this PR is related to #5204

Enable to export HF compatible LoRA Adapter
MoE Fused weight not supported ( vLLM's recommend set (issue))
only for gpt (megatron model) now.
test code (WIP)

How to Test

train with lora script

# 2 * 62GiB, 5.10s/it
PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' \
NPROC_PER_NODE=2 \
CUDA_VISIBLE_DEVICES=0,1 \
megatron sft \
    --load qwen/Qwen3-30B-A3B-mcore \
    --model_type qwen3_moe \
    --dataset  'Qwen3_dataset_0801/data.jsonl' \
    --columns '{"conversations": "messages", "tools": "tools"}' \
    --agent_template hermes \
    --loss_scale ignore_empty_think \
    --train_type lora \
    --lora_rank 8 \
    --lora_alpha 32 \
    --target_modules linear_qkv linear_proj \
    --split_dataset_ratio 0.01 \
...

convert lora without merge_lora option.

model_path=${1}

CUDA_VISIBLE_DEVICES=0 \
swift export \
    --mcore_adapters ${model_path} \
    --to_hf true \
    --torch_dtype bfloat16 \
    --output_dir ${model_path}_peft\
    --merge_lora false  ### IMPORTANT ###

test conversion

python tests/megatron/test_lora_export.py

load on vllm

vllm serve ${BACKBONE_MODEL_PATH} --enable-lora --lora-modules test1=${MODEL_PATH} test2=${MODEL_PATH} \
--enable-auto-tool-choice --tool-call-parser hermes --max_lora_rank 64 --max_loras 2 --host 0.0.0.0 --port 8000  > ${SERVE_LOG_PATH} 2>&1 &

gemini-code-assist · 2025-10-20T11:08:41Z

Summary of Changes

Hello @jason9693, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement by allowing users to export unmerged LoRA adapters from Megatron Core models into a format compatible with HuggingFace's PEFT library. This feature streamlines the process of using LoRA fine-tuned models across different frameworks, providing flexibility and reducing the need for full model retraining. The changes involve adding a dedicated conversion module and integrating it into the existing model export pipeline, specifically targeting GPT models.

Highlights

HuggingFace LoRA Adapter Export: This pull request enables the export of unmerged LoRA adapters from Megatron Core models to a HuggingFace PEFT compatible format, allowing for greater interoperability.
New Conversion Module: A new module, mcore2hf_lora.py, has been introduced to handle the complex logic of converting various Megatron Core LoRA types (e.g., fused QKV, MLP projections, MLA attention) into their HuggingFace PEFT equivalents.
Conditional LoRA Handling: The conversion utility now includes logic to either merge LoRA weights into the base model or export them as a separate adapter, based on command-line arguments (--to-hf and --merge-lora).
GPT Model Specific: The LoRA adapter conversion functionality is currently implemented and supported exclusively for GPT (Megatron model) architectures.
Module Key Remapping: The conversion process includes remapping of module keys (e.g., decoder to model, self_attention to self_attn) to align with HuggingFace's naming conventions.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a feature to export unmerged LoRA adapters from Megatron Core to HuggingFace PEFT format. The implementation is well-structured, with the core logic encapsulated in a new file. However, I've identified a critical issue in the conversion script that could lead to exported adapter files being overwritten. Additionally, there are a couple of high-severity bugs in the LoRA weight conversion logic that would prevent exporting adapters with biases and would handle other parameters inconsistently. I've provided detailed comments and suggestions to address these issues.

swift/megatron/utils/convert.py

swift/megatron/model/gpt/mcore2hf_lora.py

Jintao-Huang · 2025-10-21T05:42:46Z

Hello! Thanks for your PR. Could you please change the comments to English? 😊

jason9693 · 2025-10-21T06:06:38Z

Hello! Thanks for your PR. Could you please change the comments to English? 😊

Oh, I forgot to change the comments.
I'll change this ASAP!

jason9693 · 2025-10-21T15:50:18Z

@Jintao-Huang I just checked all of my known bugs remained and pushed it with conversion test code.
you can run test on this file.

+FYI) I was tested on Qwen3 A3B MoE model.

Jintao-Huang · 2025-10-22T06:45:20Z

Hello! I will review it as soon as possible today or tomorrow. Please wait. 😊

Jintao-Huang · 2025-10-22T06:46:26Z

Could you use pre-commit to format the code to pass the lint tests?

(use py310/py311)

pip install pre-commit
pre-commit run --all-files

jason9693 · 2025-10-22T10:56:50Z

@Jintao-Huang I just checked pre-commit and commit that changes.

jason9693 · 2025-10-24T04:56:43Z

@Jintao-Huang Is there any actions I need to follow?

jason9693 · 2025-10-25T23:04:19Z

/gemini review

gemini-code-assist

Code Review

This PR introduces a valuable feature to export LoRA adapters in the HuggingFace PEFT format without merging them, which is useful for serving with frameworks like vLLM. The implementation is well-structured, with the core conversion logic neatly encapsulated in mcore2hf_lora.py. The changes to the export script in convert.py are clean and integrate the new functionality well.

However, I've identified a few issues. The LoRA conversion logic appears to miss handling biases, which could lead to incorrect conversions for LoRA layers that use them. Additionally, the new test script test_lora_export.py has some critical flaws: it uses hardcoded paths, making it non-executable, and the weight comparison logic is incorrect as it operates on an already merged model. The test coverage for converted modules is also incomplete. Addressing these points will ensure the feature is robust and verifiable.

tests/megatron/test_lora_export.py

gemini-code-assist · 2025-10-25T23:05:45Z

swift/megatron/model/gpt/mcore2hf_lora.py

+    def push(dst, key, tensor):
+        """Create independent copy of tensor for saving + ensure contiguous memory"""
+        t = tensor.detach().clone().contiguous()
+        if key in dst:
+            raise ValueError(f'Duplicate key: {key}')
+        if 'weight' not in key:
+            logger.debug(f'Skipping non-weight key: {key}')
+            return
+        key = remap_key_for_peft(key)
+        dst[key] = t


The push function currently skips any tensor whose key does not contain 'weight'. This will cause LoRA biases to be dropped during conversion if they are enabled (e.g., with lora_bias='all'). This could lead to incorrect behavior for the exported adapter. The check should be removed to ensure all parts of the LoRA layers, including biases, are correctly processed and saved.

def push(dst, key, tensor): """Create independent copy of tensor for saving + ensure contiguous memory""" t = tensor.detach().clone().contiguous() if key in dst: raise ValueError(f'Duplicate key: {key}') key = remap_key_for_peft(key) dst[key] = t

gemini-code-assist · 2025-10-25T23:05:45Z

tests/megatron/test_lora_export.py

+    base = AutoModelForCausalLM.from_pretrained(BASE_MODEL, torch_dtype=DTYPE, device_map={'': GPU_PEFT}).eval()
+
+    peft_model = PeftModel.from_pretrained(base, ADAPTER_DIR).eval()
+    model = peft_model.merge_and_unload().eval()


The load_models function loads a PeftModel and then immediately merges it using peft_model.merge_and_unload(). The resulting merged model is then used in compare_weights. However, compare_weights and its helper peft_effective_weight are designed to work with an unmerged PeftModel to calculate the effective weight by combining the base weight and the LoRA delta (W0 + B@A*scale). When passed a merged model, peft_has_lora will be false, and the test will simply compare the merged weight against itself, making the check ineffective. The test should compare a pre-merged model against an unmerged PeftModel. To fix this, the model returned should be the unmerged PeftModel.

Suggested change

model = peft_model.merge_and_unload().eval()

model = peft_model

gemini-code-assist · 2025-10-25T23:05:45Z

tests/megatron/test_lora_export.py

+def find_linear_modules(model: nn.Module, suffixes=('q_proj', 'k_proj', 'v_proj', 'o_proj')) -> Dict[str, nn.Linear]:
+    out = {}
+    for name, mod in model.named_modules():
+        if isinstance(mod, nn.Linear) and any(name.endswith(f'.self_attn.{suf}') for suf in suffixes):
+            out[name] = mod
+    return out


The find_linear_modules function is limited to finding LoRA modules within self-attention blocks (e.g., q_proj, o_proj). However, the conversion logic also supports LoRA on MLP layers (gate_proj, up_proj, down_proj) and MoE experts. The weight comparison test should be extended to cover these modules as well to ensure complete verification.

Suggested change

def find_linear_modules(model: nn.Module, suffixes=('q_proj', 'k_proj', 'v_proj', 'o_proj')) -> Dict[str, nn.Linear]:

out = {}

for name, mod in model.named_modules():

if isinstance(mod, nn.Linear) and any(name.endswith(f'.self_attn.{suf}') for suf in suffixes):

out[name] = mod

return out

def find_linear_modules(model: nn.Module, suffixes=(

'q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj'

)) -> Dict[str, nn.Linear]:

out = {}

for name, mod in model.named_modules():

if isinstance(mod, nn.Linear) and any(name.endswith(suf) for suf in suffixes):

out[name] = mod

return out

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Jintao-Huang · 2025-10-27T12:05:24Z

Hello😊. Currently working on the mcore_bridge refactoring, might resolve them together
#6182

jason9693 · 2025-10-31T07:16:46Z

Ok, I see.
So, will you merge that PR with these commits?

Jintao-Huang · 2025-11-02T13:25:49Z

hello😊. I will incorporate some ideas from this PR and will acknowledge it in my pull request.

kaengreg · 2025-12-03T15:56:28Z

hello😊. I will incorporate some ideas from this PR and will acknowledge it in my pull request.

@Jintao-Huang Hello!
The separate MCore-to-LoRA-adapters conversion, as proposed in this thread, isn’t available yet, right?

I’m using the Megatron converter via Python rather than the CLI:

from swift.megatron.utils.convert import convert_mcore2hf

export_args = ExportArguments(
    model=model_name,
    model_type=model_type,
    to_hf=True,
    merge_lora=False,
    output_dir=args.hf_out,
    exist_ok=args.exist_ok,
    mcore_adapters=[adapters_src],
    torch_dtype=dtype,
    safe_serialization=True,
    test_convert_precision=args.test_convert_precision,
)

convert_mcore2hf(export_args)

However, this didn’t work — I end up with a full HF model instead of LoRA adapters.
I also tried commenting out the to_hf argument, but that didn’t help either.

I’m also wondering whether the --merge_lora argument is unused now.
Looking through both swift/llm/argument/export_args.py and swift/megatron/argument/export_args.py, it seems that this argument isn’t being used properly yet. Please correct me if I’m wrong.

Are there currently any working options to produce LoRA adapters from an MCore model?

Enable for exporting unmerged HF Lora Adapter

6e732f3

gemini-code-assist bot reviewed Oct 20, 2025

View reviewed changes

swift/megatron/utils/convert.py Show resolved Hide resolved

swift/megatron/model/gpt/mcore2hf_lora.py Outdated Show resolved Hide resolved

swift/megatron/model/gpt/mcore2hf_lora.py Outdated Show resolved Hide resolved

jason9693 added 3 commits October 20, 2025 20:13

modified control logic

e03e889

modified minor bugs

e84dcc8

modified key mapping

8750db2

jason9693 added 2 commits October 21, 2025 15:30

change comments to english

bbe153a

add test code and convert logic changed (\w use hf model)

3d124b8

jason9693 added 3 commits October 22, 2025 18:27

applied lint

a4039d0

apply lint

e2c9605

apply precommit run

51d5e61

jason9693 changed the title ~~Enable for exporting unmerged HF Lora Adapter~~ feat: Enable for exporting unmerged HF Lora Adapter Oct 24, 2025

gemini-code-assist bot reviewed Oct 25, 2025

View reviewed changes

Update tests/megatron/test_lora_export.py

a6fbf4b

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

jason9693 mentioned this pull request Oct 27, 2025

LoRA Model Separate Export During HF Conversion #5204

Open

Cccei000 mentioned this pull request Oct 31, 2025

megatron-swift的仅LoRA增量权重的转换 #6383

Closed

	model = peft_model.merge_and_unload().eval()
	model = peft_model

feat: Enable for exporting unmerged HF Lora Adapter #6225

Are you sure you want to change the base?

feat: Enable for exporting unmerged HF Lora Adapter #6225

Conversation

jason9693 commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR type

PR information

How to Test

Uh oh!

gemini-code-assist bot commented Oct 20, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jintao-Huang commented Oct 21, 2025

Uh oh!

jason9693 commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jason9693 commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jintao-Huang commented Oct 22, 2025

Uh oh!

Jintao-Huang commented Oct 22, 2025

Uh oh!

jason9693 commented Oct 22, 2025

Uh oh!

jason9693 commented Oct 24, 2025

Uh oh!

jason9693 commented Oct 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

Jintao-Huang commented Oct 27, 2025

Uh oh!

jason9693 commented Oct 31, 2025

Uh oh!

Jintao-Huang commented Nov 2, 2025

Uh oh!

kaengreg commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jason9693 commented Oct 20, 2025 •

edited

Loading

jason9693 commented Oct 21, 2025 •

edited

Loading

jason9693 commented Oct 21, 2025 •

edited

Loading