Skip to content

Conversation

@ved1beta
Copy link
Contributor

@ved1beta ved1beta commented Aug 4, 2025

Description

qa-lora integration

Motivation and Context

#2878

How has this been tested?

run the qalora.yml file present

Types of changes

added 2 qalora specific parameters qalora: bool and qalora_group_size: Optional[int]
also added qalora support in adapters loading and all

inspiration peft qalora

Summary by CodeRabbit

  • New Features

    • Added QALoRA adapter support across model loading, quantization, and training flows.
    • Provided an example QALoRA fine-tuning configuration.
    • Added a flash-attention QKV fusion option.
  • Validation

    • Enforced QALoRA requirements (4-bit loading, required group size) and blocked incompatible merges.
    • Extended warnings and incompatibility checks for gradient checkpointing, fused attention, and distributed setups.
    • Added QALoRA enum option in configs.
  • Tests

    • Expanded tests to cover QALoRA scenarios.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 4, 2025

📝 Walkthrough

Walkthrough

Adds QALoRA support across loaders, model quantization paths, schema enums/fields/validators, validation mixins, a test case, and an example QALoRA YAML config. Changes treat "qalora" alongside "lora"/"qlora" in loading, dtype/quantization, and validation logic.

Changes

Cohort / File(s) Summary
Example config
examples/llama-2/qalora.yml
New YAML config for QALoRA fine-tuning (model, tokenizer, dataset, adapter=qalora, qalora_group_size, training and optimizer settings).
Loaders
src/axolotl/loaders/adapter.py, src/axolotl/loaders/model.py
Treats "qalora" like "qlora"/"lora": load_adapter routed to load_lora, sets use_qalora and qalora_group_size, includes qalora in embedding dtype, quantization, and k-bit preparation logic.
Schemas — enums & config
src/axolotl/utils/schemas/enums.py, src/axolotl/utils/schemas/config.py
Adds AdapterEnum containing lora, qlora, qalora; adds flash_attn_fuse_qkv field to input config; adapter auto-enable checks extended to include qalora.
Schemas — PEFT
src/axolotl/utils/schemas/peft.py
adapter changed to AdapterEnum; adds qalora_group_size to LoraConfig and PeftConfig; adds validate_adapter pre-validator and extends qlora validation to include qalora constraints.
Validation mixins
src/axolotl/utils/schemas/validation.py
Adds validate_qalora; extends fused-LoRA checks, qlora warnings, RL gradient-checkpointing checks, and ReLoRA validation to include qalora with appropriate constraints.
Tests
tests/test_loaders.py
Extends adapter parameterization to include "qalora" for quantization config tests.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

enhancement, review requested

Suggested reviewers

  • winglian
  • NanoCode012
  • djsaunde

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0170f38 and d3fcc44.

📒 Files selected for processing (1)
  • src/axolotl/utils/schemas/validation.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/axolotl/utils/schemas/validation.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: preview
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
  • GitHub Check: PyTest (3.11, 2.7.0)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: PyTest (3.11, 2.7.1)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
examples/llama-2/qalora.yml (1)

1-49: LGTM! Comprehensive qalora example configuration.

The configuration file demonstrates proper usage of the new qalora adapter with appropriate settings including 4-bit quantization, qalora-specific parameters, and standard LoRA configuration.

Fix the missing newline at end of file.

Apply this fix to address the YAML linting issue:

 special_tokens:
+
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e758343 and 3104c6e.

📒 Files selected for processing (9)
  • examples/llama-2/qalora.yml (1 hunks)
  • src/axolotl/core/builders/causal.py (1 hunks)
  • src/axolotl/core/training_args_base.py (1 hunks)
  • src/axolotl/loaders/adapter.py (2 hunks)
  • src/axolotl/loaders/model.py (5 hunks)
  • src/axolotl/utils/schemas/config.py (1 hunks)
  • src/axolotl/utils/schemas/peft.py (3 hunks)
  • src/axolotl/utils/schemas/validation.py (3 hunks)
  • tests/test_loaders.py (1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
examples/llama-2/qalora.yml

[error] 49-49: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (19)
tests/test_loaders.py (1)

63-63: LGTM: Test coverage extended for qalora adapter.

The parameterization correctly includes "qalora" alongside the existing adapter types, ensuring quantization configuration is tested for the new adapter type.

src/axolotl/core/builders/causal.py (1)

154-155: LGTM: Consistent adapter handling for qalora.

The implementation follows the established pattern used for qlora, properly setting the qalora training argument flag when the adapter type is "qalora".

src/axolotl/utils/schemas/config.py (1)

1067-1067: LGTM: LoRA kernel optimization extended to qalora.

Correctly includes "qalora" in the adapter type check for auto-enabling LoRA kernel optimizations, ensuring qalora users benefit from the same performance optimizations as lora and qlora users.

src/axolotl/loaders/adapter.py (2)

97-102: LGTM: Clean qalora configuration handling.

The conditional logic properly adds qalora-specific parameters to the LoRA configuration using appropriate existence checks with hasattr() before accessing the attributes.


184-184: LGTM: Proper adapter routing for qalora.

Correctly includes "qalora" in the adapter type check and routes it to the load_lora function, maintaining consistency with other LoRA-based adapter types.

src/axolotl/core/training_args_base.py (1)

166-173: LGTM: Well-structured qalora training arguments.

The new fields follow established patterns with appropriate types, defaults, and metadata. The qalora_group_size default of 16 is reasonable, and the placement after the qlora field maintains logical organization.

src/axolotl/loaders/model.py (5)

151-152: LGTM! Consistent FSDP support for qalora.

The property correctly includes "qalora" alongside "qlora" for FSDP enablement checks, maintaining consistency with the existing logic.


308-309: LGTM! Proper gradient checkpointing support for qalora.

The condition correctly includes "qalora" for gradient checkpointing and dtype handling, ensuring consistent behavior with other LoRA variants.


585-585: LGTM! Quantization config support extended to qalora.

The adapter check properly includes "qalora" for quantization configuration handling, maintaining consistency with existing LoRA and QLoRA support.


604-607: LGTM! BitsAndBytesConfig support for qalora.

The 4-bit loading configuration correctly includes "qalora" alongside "qlora", ensuring proper quantization setup for the new adapter type.


892-892: LGTM! Model preparation for quantization includes qalora.

The condition properly includes "qalora" for kbit training preparation, maintaining consistency with LoRA and QLoRA handling.

src/axolotl/utils/schemas/peft.py (3)

44-44: LGTM! Adapter description updated for qalora.

The field description properly includes "qalora" alongside the existing adapter types.


131-140: LGTM! Well-defined qalora configuration fields.

The new fields use_qalora and qalora_group_size are properly defined with appropriate defaults and descriptive documentation.


181-202: LGTM! Comprehensive qalora validation logic.

The validation method correctly mirrors the qlora constraints, ensuring proper usage of qalora with quantization settings. The logic appropriately prevents incompatible configurations.

src/axolotl/utils/schemas/validation.py (5)

594-594: LGTM! Fused module restriction extended to qalora.

The validation correctly prevents fused modules with qalora, maintaining consistency with lora and qlora restrictions.


600-609: LGTM! Comprehensive qalora validation method.

The new validation method properly enforces qalora-specific constraints: requires 4-bit loading, group size specification, and prevents merging. This mirrors the validation logic in the PEFT schema appropriately.


615-615: LGTM! Zero3 warning extended to qalora.

The warning appropriately includes qalora alongside qlora for potential gradient checkpointing issues with DeepSpeed Zero3.


705-705: LGTM! RL training validation includes qalora.

The distributed RL training check correctly includes qalora for known incompatibilities with reentrant checkpointing.


1182-1185: LGTM! ReLoRA compatibility extended to qalora.

The validation properly includes qalora as a valid adapter for ReLoRA usage, with appropriate error messaging.

@codecov
Copy link

codecov bot commented Aug 4, 2025

Codecov Report

❌ Patch coverage is 45.83333% with 26 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/axolotl/utils/schemas/peft.py 23.52% 13 Missing ⚠️
src/axolotl/utils/schemas/validation.py 35.71% 9 Missing ⚠️
src/axolotl/loaders/adapter.py 33.33% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

@winglian
Copy link
Collaborator

winglian commented Aug 4, 2025

Were you able to merge the adapters back into the quantized model? If I understand the paper correctly, the benefit comes from the ability to merge it back into the quantized model.

@ved1beta
Copy link
Contributor Author

ved1beta commented Aug 5, 2025

as of now the PR Does not implement adapter merging back into quantized models ,

@ved1beta
Copy link
Contributor Author

ved1beta commented Aug 5, 2025

The PEFT library's QA-LoRA implementation intentionally doesn't support merging for GPTQ , i think we should keep it the same #2878 we could just enhance the error messages

@NanoCode012
Copy link
Collaborator

Re: #2878 (comment)

Offering help with QA-LoRA adapter merge process! Since PEFT doesn't support adapter merging with quantized models yet, I've implemented a custom solution. Successfully replicated the QA-LoRA paper and built a working merge process for GPTQ + QA-LoRA adapters. Happy to share code/knowledge

@gapsong, could you share how you've implemented the merge?

@gapsong
Copy link

gapsong commented Aug 23, 2025

Hi @NanoCode012, @ved1beta, @winglian,

Thanks for the work on this. Here is the important change.
The code is a bit messy...but I hope you get the main idea.
ModelCloud/GPTQModel@main...gapsong:GPTQModel:qzero_unquantized
I got the information from a conversation with the qa-lora author yuhuixu1993/qa-lora#45

Problem:
I noticed the qzero values are currently being quantized during the save process.

Why this is an issue:
The subsequent merge of these qzero values is a high-precision operation. If we save them in a quantized format, we lose precision due to rounding before the merge even happens, which defeats the purpose.

Solution:
My changes address this by ensuring the qzero values are saved in a full-precision format (specifically FP16). This preserves their original values for the merge operation.

Happy to discuss this further!

Copy link
Collaborator

@NanoCode012 NanoCode012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I left some minor config points.

Comment on lines 41 to 46
adapter: str | None = Field(
default=None,
json_schema_extra={
"description": "If you want to use 'lora' or 'qlora' or leave blank to train all parameters in original model"
"description": "If you want to use 'lora' or 'qlora' or 'qalora' or leave blank to train all parameters in original model"
},
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although unrelated, but I wonder if we can set adapter: AdapterEnum | None. This should simplify our checks later below to not allow incorrect string.

Comment on lines 600 to 609
@model_validator(mode="after")
def validate_qalora(self):
if self.adapter == "qalora":
if not self.load_in_4bit:
raise ValueError("QALoRA requires load_in_4bit to be True")
if not hasattr(self, "qalora_group_size") or self.qalora_group_size is None:
raise ValueError("QALoRA requires qalora_group_size to be specified")
if self.merge_lora:
raise ValueError("QALoRA does not support merge_lora yet")
return self
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like duplicate of validation in src/axolotl/utils/schemas/peft.py.

Comment on lines 98 to 99
if hasattr(cfg, "use_qalora") and cfg.use_qalora:
lora_config_kwargs["use_qalora"] = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the adapter is qalora, we can just set this lora kwarg to true.

Comment on lines +100 to +101
if hasattr(cfg, "qalora_group_size") and cfg.qalora_group_size:
lora_config_kwargs["qalora_group_size"] = cfg.qalora_group_size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to raise an Error here or possibly in validation if a user does not provide this?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/axolotl/utils/schemas/validation.py (1)

1-160: Expand all QLoRA adapter checks to include QALoRA

The recent search shows several code paths that gate on “qlora” (or ["lora","qlora"]) without accounting for the new “qalora” adapter. Please update each of the following sites to also include "qalora":

  • src/axolotl/utils/schemas/validation.py (around line 592)
    Current:

    if self.adapter in ["lora", "qlora"] and self.flash_attn_fuse_mlp:
        raise ValueError("Fused modules are not supported with LoRA/QLoRA")

    Change to:

    - if self.adapter in ["lora", "qlora"] and self.flash_attn_fuse_mlp:
    + if self.adapter in ["lora", "qlora", "qalora"] and self.flash_attn_fuse_mlp:
  • src/axolotl/utils/schemas/peft.py (line 154)
    Current:

    if self.adapter == "qlora":
        if self.merge_lora:
            # …

    Change to:

    - if self.adapter == "qlora":
    + if self.adapter in ["qlora", "qalora"]:
  • src/axolotl/loaders/patch_manager.py (line 293)
    Current:

    if ( … and self.cfg.adapter == "qlora"):
        from axolotl.monkeypatch.fsdp2_qlora import

    Change to:

    - if ( … and self.cfg.adapter == "qlora"):
    + if ( … and self.cfg.adapter in ["qlora", "qalora"]):
  • src/axolotl/loaders/model.py (around line 558)
    Current:

    elif self.cfg.adapter == "qlora" and self.model_kwargs.get("load_in_4bit", False):
        # …

    Change to:

    - elif self.cfg.adapter == "qlora" and self.model_kwargs.get("load_in_4bit", False):
    + elif self.cfg.adapter in ["qlora", "qalora"] and self.model_kwargs.get("load_in_4bit", False):
  • src/axolotl/core/builders/causal.py (line 164)
    Current:

    if self.cfg.adapter == "qlora":
        training_arguments_kwargs["qlora"] = True

    Change to:

    - if self.cfg.adapter == "qlora":
    + if self.cfg.adapter in ["qlora", "qalora"]:
        training_arguments_kwargs["qlora"] = True

No standalone "QLoRA" messages were found, so documentation or log texts appear up to date. After applying these diffs, all code paths will correctly recognize QALoRA alongside LoRA/QLoRA.

🧹 Nitpick comments (11)
src/axolotl/utils/schemas/config.py (2)

1071-1119: Outdated comment; QALoRA is now included in auto-enable logic

The inline comment still says “Only proceed if using LoRA or QLoRA adapter,” but the condition now includes "qalora". Please update the comment for accuracy.

Apply this diff:

-        # Only proceed if using LoRA or QLoRA adapter
+        # Only proceed if using LoRA, QLoRA, or QALoRA adapter

1076-1119: Confirm LoRA kernel compatibility for QALoRA and example config interplay

QALoRA was added to the auto-enable path for LoRA kernels. Two quick points:

  • Compatibility: These kernels target standard LoRA autograd paths. If QALoRA uses the same injection points, we’re good; otherwise, auto-enabling could be risky. Please confirm coverage in integration tests.
  • Example config: The provided qalora.yml uses lora_dropout: 0.05, which prevents auto-enable by design. If the intent is to showcase auto-enable, set lora_dropout to 0 in the example.

Would you like me to add a minimal test that exercises QALoRA with kernels enabled and disabled to ensure parity?

examples/llama-2/qalora.yml (4)

15-18: Trim unused/empty fields for clarity

lora_model_dir is empty; consider removing it to avoid confusion in example configs.

 adapter: qalora
-lora_model_dir:

23-24: Group size present; consider documenting recommended values

qalora_group_size: 16 is specified as required by validation. If there are preferred or hardware-aligned values (e.g., multiples of 8 or 16), consider adding a brief comment in the example to guide users.


25-29: Set lora_dropout to 0 to enable auto LoRA kernels by default

Auto-enabling LoRA kernels only triggers when lora_dropout == 0. If you want the example to benefit from kernel optimizations out of the box, set dropout to 0 (or explicitly enable the kernels).

Option A (simplest; leverages auto-enable):

-lora_dropout: 0.05
+lora_dropout: 0

Option B (explicit; works regardless of auto-enable):

 lora_target_linear: true
+lora_mlp_kernel: true
+lora_qkv_kernel: true
+lora_o_kernel: true

40-43: GC + packing is fine; call out Zero3/reentrant caveat if users add DeepSpeed

This config doesn’t set DeepSpeed Zero3, so you’re fine. If users add Zero3 later, the validators will warn about reentrant checkpointing interactions with QLoRA/QALoRA. Consider a short comment in the example to save users a round trip.

src/axolotl/utils/schemas/validation.py (5)

577-582: De-duplicate fused modules check and unify message

You added a new fused-modules guard that includes QALoRA, but a second guard below (Lines 592-594) still checks LoRA/QLoRA only. This is redundant and inconsistent. Recommend keeping a single check that covers all adapters.

Apply this diff to remove the duplicate later check:

     def check_fused_lora(self):
-        if self.adapter in ["lora", "qlora", "qalora"] and (
-            self.flash_attn_fuse_qkv or self.flash_attn_fuse_mlp
-        ):
-            raise ValueError("Fused modules are not supported with LoRA/QLoRA/QALoRA")
-        return self
+        if self.adapter in ["lora", "qlora", "qalora"] and (
+            self.flash_attn_fuse_qkv or self.flash_attn_fuse_mlp
+        ):
+            raise ValueError("Fused modules are not supported with LoRA/QLoRA/QALoRA")
+        return self

And delete Lines 592-594:

-        if self.adapter in ["lora", "qlora"] and self.flash_attn_fuse_mlp:
-            raise ValueError("Fused modules are not supported with LoRA/QLoRA")

583-595: Strengthen QALoRA validation and improve error text

Good to see explicit QALoRA checks. Two small improvements:

  • Type/validity: Ensure qalora_group_size is a positive int to fail fast on typos.
  • Messaging: Make the merge error more actionable per PR discussion (merging not supported for quantized models).
     def validate_qalora(self):
         if self.adapter == "qalora":
             if not self.load_in_4bit:
                 raise ValueError("QALoRA requires load_in_4bit to be True")
-            if not hasattr(self, "qalora_group_size") or self.qalora_group_size is None:
-                raise ValueError("QALoRA requires qalora_group_size to be specified")
+            if (
+                not hasattr(self, "qalora_group_size")
+                or self.qalora_group_size is None
+                or not isinstance(self.qalora_group_size, int)
+                or self.qalora_group_size <= 0
+            ):
+                raise ValueError("QALoRA requires a positive integer qalora_group_size")
             if self.merge_lora:
-                raise ValueError("QALoRA does not support merge_lora yet")
+                raise ValueError(
+                    "QALoRA does not support merging adapters into quantized (4-bit) models. "
+                    "Disable merge_lora or switch to a non-quantized merge workflow."
+                )
         if self.adapter in ["lora", "qlora"] and self.flash_attn_fuse_mlp:
             raise ValueError("Fused modules are not supported with LoRA/QLoRA")
         return self

600-614: Update warning text to include QALoRA; consider robust Zero3 detection

You included QALoRA in the condition, but the warning string mentions only qlora. Also, checking "zero3" in data.get("deepspeed", "") relies on the path string; if it’s a JSON file path without “zero3” in the name, this check won’t catch it.

Minimal text fix:

-                "qlora + zero3 with use_reentrant: false may result in a CheckpointError about recomputed values"
+                "qlora/qalora + zero3 with use_reentrant: false may result in a CheckpointError about recomputed values"

Optional robustness (pseudo-change): if deepspeed is a file path, parse JSON and check zero_optimization stage or a dedicated zero3 flag before warning.


700-715: Error message should mention QALoRA as well

The condition includes QALoRA, but the message references QLoRA only. Suggest updating for consistency.

-                "is not supported for distributed RL training with QLoRA. Please set "
+                "is not supported for distributed RL training with QLoRA/QALoRA. Please set "

513-595: Reduce adapter list duplication across validators

"lora", "qlora", "qalora" appear in multiple conditions. Consider centralizing LoRA-like adapters in a single constant to prevent drift (e.g., LORAISH_ADAPTERS = ("lora", "qlora", "qalora")) and reusing it across checks.

Also applies to: 596-661, 670-716, 1174-1223

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 573992b and 366f02e.

📒 Files selected for processing (7)
  • examples/llama-2/qalora.yml (1 hunks)
  • src/axolotl/core/builders/causal.py (1 hunks)
  • src/axolotl/loaders/adapter.py (2 hunks)
  • src/axolotl/loaders/model.py (5 hunks)
  • src/axolotl/utils/schemas/config.py (1 hunks)
  • src/axolotl/utils/schemas/peft.py (3 hunks)
  • src/axolotl/utils/schemas/validation.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/axolotl/core/builders/causal.py
  • src/axolotl/loaders/adapter.py
  • src/axolotl/loaders/model.py
  • src/axolotl/utils/schemas/peft.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/axolotl/utils/schemas/validation.py (1)
src/axolotl/cli/main.py (1)
  • merge_lora (281-292)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
🔇 Additional comments (2)
examples/llama-2/qalora.yml (1)

5-7: Correct low-bit settings for QALoRA

QALoRA requires load_in_4bit: true. This looks correct, thanks.

src/axolotl/utils/schemas/validation.py (1)

1204-1207: ReLoRA adapter set updated to include QALoRA — LGTM

The adapter guard for ReLoRA now includes "qalora"; message updated accordingly. Looks good.

@NanoCode012
Copy link
Collaborator

@gapsong

I noticed the qzero values are currently being quantized during the save process.

Could you share where this is happening in peft?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
src/axolotl/utils/schemas/peft.py (2)

7-8: Good: type-safe adapter import.

Importing AdapterEnum enables validation at the type level and addresses earlier feedback about constraining adapter values.


43-48: Good: adapter typed as AdapterEnum | None.

This removes the risk of invalid strings and simplifies downstream checks. Description text is fine.

🧹 Nitpick comments (3)
src/axolotl/utils/schemas/peft.py (3)

133-136: Validate qalora_group_size (positive integer; only meaningful with adapter=qalora).

Add basic validation to prevent misconfiguration and clarify scope of this field.

Apply this (outside the shown hunk) to enforce positivity:

@@ class LoraConfig(BaseModel):
     qalora_group_size: int | None = Field(
         default=None,
         json_schema_extra={"description": "Group size for QALoRA quantization pooling"},
     )
+    @field_validator("qalora_group_size")
+    @classmethod
+    def _validate_qalora_group_size(cls, v):
+        if v is None:
+            return v
+        if isinstance(v, bool) or not isinstance(v, int) or v <= 0:
+            raise ValueError("qalora_group_size must be a positive integer")
+        return v

And (see my later comment on lines 178-199) also gate this field to only be set when adapter == qalora.


140-152: Fix user-facing error message spacing and grammar in validator.

The concatenated strings render as “training.If…”. Also “full finetune” reads awkwardly.

Apply:

-            raise ValueError(
-                "load_in_8bit and load_in_4bit are not supported without setting an adapter for training."
-                "If you want to full finetune, please turn off load_in_8bit and load_in_4bit."
-            )
+            raise ValueError(
+                "load_in_8bit and load_in_4bit are not supported without setting an adapter for training. "
+                "If you want to fully fine-tune, please turn off load_in_8bit and load_in_4bit."
+            )

156-177: Prefer Enum member comparison over string literals.

Now that adapter is AdapterEnum, compare against AdapterEnum.qlora for clarity and static tooling.

Apply:

-        if self.adapter == "qlora":
+        if self.adapter == AdapterEnum.qlora:
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 366f02e and 4e7ef01.

📒 Files selected for processing (3)
  • src/axolotl/loaders/adapter.py (2 hunks)
  • src/axolotl/utils/schemas/enums.py (1 hunks)
  • src/axolotl/utils/schemas/peft.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/axolotl/loaders/adapter.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/axolotl/utils/schemas/peft.py (1)
src/axolotl/utils/schemas/enums.py (1)
  • AdapterEnum (22-27)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: PyTest (3.11, 2.7.1)
  • GitHub Check: PyTest (3.11, 2.7.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
  • GitHub Check: preview
🔇 Additional comments (1)
src/axolotl/utils/schemas/enums.py (1)

22-28: AdapterEnum addition looks good and aligns with downstream usage.

Using a str-backed Enum is appropriate here for JSON/schema friendliness.

Comment on lines +178 to 199
if self.adapter == "qalora":
if self.merge_lora:
# can't merge qalora if loaded in 8bit or 4bit
if self.load_in_8bit:
raise ValueError("Can't merge qalora if loaded in 8bit")

if self.gptq:
raise ValueError("Can't merge qalora if gptq")

if self.load_in_4bit:
raise ValueError("Can't merge qalora if loaded in 4bit")

else:
if self.load_in_8bit:
raise ValueError("Can't load qalora in 8bit")

if self.gptq:
raise ValueError("Can't load qalora if gptq")

if not self.load_in_4bit:
raise ValueError("Require cfg.load_in_4bit to be True for qalora")
return self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Mirror Enum comparison for qalora; add guard for stray qalora_group_size; consider clearer GPTQ limitation text.

  • Use AdapterEnum for comparisons.
  • Prevent configs that set qalora_group_size while adapter != qalora.
  • Optional: Expand GPTQ error text to explicitly state “not supported by this PR/PEFT parity” to match the PR objectives and avoid confusion.

Apply:

-        if self.adapter == "qalora":
+        if self.adapter == AdapterEnum.qalora:
             if self.merge_lora:
                 # can't merge qalora if loaded in 8bit or 4bit
                 if self.load_in_8bit:
                     raise ValueError("Can't merge qalora if loaded in 8bit")
 
                 if self.gptq:
-                    raise ValueError("Can't merge qalora if gptq")
+                    raise ValueError(
+                        "Can't merge qalora if gptq (adapter merging into GPTQ is not supported in this PR; PEFT also does not support it)"
+                    )
 
                 if self.load_in_4bit:
                     raise ValueError("Can't merge qalora if loaded in 4bit")
 
             else:
                 if self.load_in_8bit:
                     raise ValueError("Can't load qalora in 8bit")
 
                 if self.gptq:
-                    raise ValueError("Can't load qalora if gptq")
+                    raise ValueError("Can't load qalora if gptq")
 
                 if not self.load_in_4bit:
                     raise ValueError("Require cfg.load_in_4bit to be True for qalora")
+        # Disallow stray qalora_group_size when adapter is not qalora
+        if self.adapter != AdapterEnum.qalora and self.qalora_group_size is not None:
+            raise ValueError("qalora_group_size is only valid when adapter == 'qalora'")

To ensure consistency repo-wide (no lingering string literal checks; field is only used with qalora), run:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "String literal adapter comparisons (should prefer AdapterEnum.*):"
rg -nP -C2 '\badapter\s*==\s*"(?:lora|qlora|qalora)"' --type=py || true

echo
echo "Usages of qalora_group_size:"
rg -n --type=py -C2 '\bqalora_group_size\b' || true

Length of output: 7018


Update QALoRA schema validation: use AdapterEnum and guard stray qalora_group_size

The validation logic in src/axolotl/utils/schemas/peft.py should be tightened and made consistent:

  • In the validate_qalora (lines 178–199):
    • Replace string-literal checks with the enum comparison:
      -    if self.adapter == "qalora":
      +    if self.adapter == AdapterEnum.qalora:
    • Add a guard to prevent a non-QALoRA adapter from carrying a qalora_group_size:
          # Disallow stray qalora_group_size when adapter is not qalora
          if self.adapter != AdapterEnum.qalora and self.qalora_group_size is not None:
              raise ValueError(
                  "qalora_group_size is only valid when adapter == AdapterEnum.qalora"
              )
  • Optional: Clarify the GPTQ limitation message in the merge branch to reflect “not supported by this PR/PEFT parity.”

Note that string-literal comparisons for adapters are still widespread across tests and other modules (e.g. tests/test_loaders.py, src/axolotl/loaders/adapter.py, etc.). We recommend migrating all of these to AdapterEnum in a follow-up refactor for consistency.

🤖 Prompt for AI Agents
In src/axolotl/utils/schemas/peft.py around lines 178–199, update
validate_qalora to use AdapterEnum instead of string literals when checking for
qalora, and add a guard that raises a ValueError if self.adapter !=
AdapterEnum.qalora and self.qalora_group_size is not None (message:
"qalora_group_size is only valid when adapter == AdapterEnum.qalora"). Also
change the existing qalora-related checks to compare against AdapterEnum.qalora,
and adjust the GPTQ/merge error message in the merge branch to indicate GPTQ is
not supported by this PR/PEFT parity.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/axolotl/loaders/model.py (1)

556-591: Fix QLoRA/QALoRA Branch Ordering to Preserve Stable BnB Defaults

Currently in src/axolotl/loaders/model.py (≈ lines 556–591), the generic 4-bit branch for both QLoRA and QALoRA fires before the QLoRA-specific defaults, so:

  • QLoRA paths never reach their well-tuned bnb_config (they’re shadowed by the generic branch).
  • QALoRA paths use whatever is in model_config.quantization_config, which may include unsupported or unexpected keys.

To restore stable defaults and share them with QALoRA, remove the generic branch and merge QALoRA into the QLoRA branch:

• File: src/axolotl/loaders/model.py
– Remove the first elif self.cfg.adapter in ["qlora", "qalora"] and load_in_4bit block (lines ~560–565).
– Change the QLoRA-only condition to include QALoRA.

Suggested minimal diff:

--- a/src/axolotl/loaders/model.py
+++ b/src/axolotl/loaders/model.py
@@ -560,16 +560,11 @@ def _set_quantization_config(self):
-        elif (
-            self.cfg.adapter in ["qlora", "qalora"]
-            and self.model_kwargs["load_in_4bit"]
-        ):
-            quantization_config = getattr(self.model_config, "quantization_config", {})
-            self.model_kwargs["quantization_config"] = BitsAndBytesConfig(
-                **quantization_config
-            )
-        elif self.cfg.adapter == "qlora" and self.model_kwargs.get(
+        elif self.cfg.adapter in ["qlora", "qalora"] and self.model_kwargs.get(
             "load_in_4bit", False
         ):
             bnb_config = {
                 "load_in_4bit": True,
                 "llm_int8_threshold": 6.0,
                 "llm_int8_has_fp16_weight": False,

Benefits:

  • QLoRA uses its tested NF4 + double-quant defaults.
  • QALoRA inherits those same stable settings.
  • Eliminates risk of passing arbitrary keys into BitsAndBytesConfig.

Please apply this change before merging.

🧹 Nitpick comments (1)
src/axolotl/utils/schemas/config.py (1)

1078-1125: Auto-enable LoRA kernels for QALoRA: logic looks right; update comment and validate behavior

The inclusion of "qalora" in the adapter check is correct and consistent with the rest of the PR. Two small asks:

  • Update the comment “Only proceed if using LoRA or QLoRA adapter” to include QALoRA.
  • Please sanity-check that the LoRA Triton kernels work with PEFT’s QALoRA path (grad shapes, fused ops), especially when lora_dropout != 0 and on multi-GPU + FSDP2 where auto-enabling is permitted.

Suggested tiny comment tweak:

-        # Only proceed if using LoRA or QLoRA adapter
+        # Only proceed if using LoRA / QLoRA / QALoRA adapter
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4e7ef01 and 0170f38.

📒 Files selected for processing (2)
  • src/axolotl/loaders/model.py (5 hunks)
  • src/axolotl/utils/schemas/config.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/axolotl/loaders/model.py (1)
src/axolotl/integrations/base.py (2)
  • cfg (339-340)
  • cfg (343-344)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: PyTest (3.11, 2.7.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
  • GitHub Check: PyTest (3.11, 2.7.1)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
  • GitHub Check: preview
🔇 Additional comments (4)
src/axolotl/loaders/model.py (4)

158-160: Include QALoRA in FSDP gating: LGTM

Updating the FSDP+PEFT property to consider QALoRA is consistent with the rest of the integration.


321-327: Enable grad checkpointing + FA2 dtype path for QALoRA: LGTM

Treating QALoRA like LoRA/QLoRA here makes sense. This will keep FA2 compatibility after k-bit prep. No issues spotted.


536-555: Quantization-config passthrough now includes QALoRA: LGTM with a sanity note

Including QALoRA alongside LoRA/QLoRA for GPTQ/AWQ/BnB model_config.quantization_config passthrough is correct. Note: ensure model_config.quantization_config keys match the target config class (e.g., BitsAndBytesConfig) for this branch—your current quant_method gating prevents most mismatches.


869-876: prepare_model_for_kbit_training is required for QA-LoRA with 4/8-bit models
The PEFT and QLoRA examples/docs explicitly show that you must run prepare_model_for_kbit_training (or perform the equivalent fp32 casting, layernorm adjustments, gradient‐checkpointing setup, etc.) before wrapping the model with QA-LoRA. Since PEFT/QA-LoRA does not prepare the base model internally, the existing call here is both necessary and correct—no further changes are needed.

Comment on lines +544 to +549
flash_attn_fuse_qkv: bool | None = Field(
default=None,
json_schema_extra={
"description": "Whether to fuse QKV projection into a single operation"
},
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

flash_attn_fuse_qkv is defined but appears unused; add wiring + guards

Good addition, but I don’t see this flag being consumed anywhere (e.g., in PatchManager or _set_attention_config). Also consider guarding for:

  • flash-attn presence and minimum version that supports QKV fusion,
  • compatible GPU arch (Ampere+), and
  • interaction with attn_implementation (flash_attention_2).

Follow-ups:

  • Wire this flag where the FA2 kernels are configured.
  • Add a validation/model_validator to warn/disable when unsupported.

Run to confirm usage across the repo:


🏁 Script executed:

#!/bin/bash
rg -n --hidden -g '!**/dist/**' -C2 '\bflash_attn_fuse_qkv\b'

Length of output: 855


Wire flash_attn_fuse_qkv into the attention setup and tighten guards

Right now this flag is only declared in config.py (line 544) and checked in LoRA validation (in validation.py around line 578), but never actually passed into the code paths that select and configure FlashAttention kernels. To complete this feature, please:

• In your attention‐kernel configuration (e.g. in PatchManager or the _set_attention_config routine where attn_implementation is chosen), consume flash_attn_fuse_qkv. When it’s true, enable QKV fusion in FlashAttention-2.
• Add runtime guards immediately around that wiring to verify:
– The FlashAttention library is installed and at or above the minimum version that supports QKV fusion.
– The GPU’s compute capability is Ampere (8.0+) or newer.
– If attn_implementation is set to something else, ensure it’s overridden or disabled when flash_attn_fuse_qkv=True.
• Extend your model‐validation logic (in validation.py or a new model_validator) to error‐out or emit a warning when flash_attn_fuse_qkv=True but the environment doesn’t meet the above requirements.

By wiring the flag where the FA2 kernels are configured and codifying these guards, users will have both compile-time schema safety and correct runtime behavior.

@gapsong
Copy link

gapsong commented Aug 25, 2025

@gapsong

I noticed the qzero values are currently being quantized during the save process.

Could you share where this is happening in peft?
@NanoCode012
It is not happening in peft it is happening in gptq. Either modelGPTQ or AutoGPTQ depending on the library that you are using.

Comment on lines 577 to 581
if self.adapter in ["lora", "qlora", "qalora"] and (
self.flash_attn_fuse_qkv or self.flash_attn_fuse_mlp
):
raise ValueError("Fused modules are not supported with LoRA/QLoRA/QALoRA")
return self
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a merge conflict as we removed flash_attn_fuse_qkv recently.

@NanoCode012 NanoCode012 added the wontfix This will not be worked on label Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wontfix This will not be worked on

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants