-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
qa-lora integration #3013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
qa-lora integration #3013
Conversation
📝 WalkthroughWalkthroughAdds QALoRA support across loaders, model quantization paths, schema enums/fields/validators, validation mixins, a test case, and an example QALoRA YAML config. Changes treat "qalora" alongside "lora"/"qlora" in loading, dtype/quantization, and validation logic. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
examples/llama-2/qalora.yml (1)
1-49: LGTM! Comprehensive qalora example configuration.The configuration file demonstrates proper usage of the new qalora adapter with appropriate settings including 4-bit quantization, qalora-specific parameters, and standard LoRA configuration.
Fix the missing newline at end of file.
Apply this fix to address the YAML linting issue:
special_tokens: +
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
examples/llama-2/qalora.yml(1 hunks)src/axolotl/core/builders/causal.py(1 hunks)src/axolotl/core/training_args_base.py(1 hunks)src/axolotl/loaders/adapter.py(2 hunks)src/axolotl/loaders/model.py(5 hunks)src/axolotl/utils/schemas/config.py(1 hunks)src/axolotl/utils/schemas/peft.py(3 hunks)src/axolotl/utils/schemas/validation.py(3 hunks)tests/test_loaders.py(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
examples/llama-2/qalora.yml
[error] 49-49: no new line character at the end of file
(new-line-at-end-of-file)
🔇 Additional comments (19)
tests/test_loaders.py (1)
63-63: LGTM: Test coverage extended for qalora adapter.The parameterization correctly includes "qalora" alongside the existing adapter types, ensuring quantization configuration is tested for the new adapter type.
src/axolotl/core/builders/causal.py (1)
154-155: LGTM: Consistent adapter handling for qalora.The implementation follows the established pattern used for qlora, properly setting the qalora training argument flag when the adapter type is "qalora".
src/axolotl/utils/schemas/config.py (1)
1067-1067: LGTM: LoRA kernel optimization extended to qalora.Correctly includes "qalora" in the adapter type check for auto-enabling LoRA kernel optimizations, ensuring qalora users benefit from the same performance optimizations as lora and qlora users.
src/axolotl/loaders/adapter.py (2)
97-102: LGTM: Clean qalora configuration handling.The conditional logic properly adds qalora-specific parameters to the LoRA configuration using appropriate existence checks with
hasattr()before accessing the attributes.
184-184: LGTM: Proper adapter routing for qalora.Correctly includes "qalora" in the adapter type check and routes it to the
load_lorafunction, maintaining consistency with other LoRA-based adapter types.src/axolotl/core/training_args_base.py (1)
166-173: LGTM: Well-structured qalora training arguments.The new fields follow established patterns with appropriate types, defaults, and metadata. The
qalora_group_sizedefault of 16 is reasonable, and the placement after theqlorafield maintains logical organization.src/axolotl/loaders/model.py (5)
151-152: LGTM! Consistent FSDP support for qalora.The property correctly includes "qalora" alongside "qlora" for FSDP enablement checks, maintaining consistency with the existing logic.
308-309: LGTM! Proper gradient checkpointing support for qalora.The condition correctly includes "qalora" for gradient checkpointing and dtype handling, ensuring consistent behavior with other LoRA variants.
585-585: LGTM! Quantization config support extended to qalora.The adapter check properly includes "qalora" for quantization configuration handling, maintaining consistency with existing LoRA and QLoRA support.
604-607: LGTM! BitsAndBytesConfig support for qalora.The 4-bit loading configuration correctly includes "qalora" alongside "qlora", ensuring proper quantization setup for the new adapter type.
892-892: LGTM! Model preparation for quantization includes qalora.The condition properly includes "qalora" for kbit training preparation, maintaining consistency with LoRA and QLoRA handling.
src/axolotl/utils/schemas/peft.py (3)
44-44: LGTM! Adapter description updated for qalora.The field description properly includes "qalora" alongside the existing adapter types.
131-140: LGTM! Well-defined qalora configuration fields.The new fields
use_qaloraandqalora_group_sizeare properly defined with appropriate defaults and descriptive documentation.
181-202: LGTM! Comprehensive qalora validation logic.The validation method correctly mirrors the qlora constraints, ensuring proper usage of qalora with quantization settings. The logic appropriately prevents incompatible configurations.
src/axolotl/utils/schemas/validation.py (5)
594-594: LGTM! Fused module restriction extended to qalora.The validation correctly prevents fused modules with qalora, maintaining consistency with lora and qlora restrictions.
600-609: LGTM! Comprehensive qalora validation method.The new validation method properly enforces qalora-specific constraints: requires 4-bit loading, group size specification, and prevents merging. This mirrors the validation logic in the PEFT schema appropriately.
615-615: LGTM! Zero3 warning extended to qalora.The warning appropriately includes qalora alongside qlora for potential gradient checkpointing issues with DeepSpeed Zero3.
705-705: LGTM! RL training validation includes qalora.The distributed RL training check correctly includes qalora for known incompatibilities with reentrant checkpointing.
1182-1185: LGTM! ReLoRA compatibility extended to qalora.The validation properly includes qalora as a valid adapter for ReLoRA usage, with appropriate error messaging.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
Were you able to merge the adapters back into the quantized model? If I understand the paper correctly, the benefit comes from the ability to merge it back into the quantized model. |
|
as of now the PR Does not implement adapter merging back into quantized models , |
|
The PEFT library's QA-LoRA implementation intentionally doesn't support merging for GPTQ , i think we should keep it the same #2878 we could just enhance the error messages |
|
Re: #2878 (comment)
@gapsong, could you share how you've implemented the merge? |
|
Hi @NanoCode012, @ved1beta, @winglian, Thanks for the work on this. Here is the important change. Problem: Why this is an issue: Solution: Happy to discuss this further! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I left some minor config points.
src/axolotl/utils/schemas/peft.py
Outdated
| adapter: str | None = Field( | ||
| default=None, | ||
| json_schema_extra={ | ||
| "description": "If you want to use 'lora' or 'qlora' or leave blank to train all parameters in original model" | ||
| "description": "If you want to use 'lora' or 'qlora' or 'qalora' or leave blank to train all parameters in original model" | ||
| }, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although unrelated, but I wonder if we can set adapter: AdapterEnum | None. This should simplify our checks later below to not allow incorrect string.
| @model_validator(mode="after") | ||
| def validate_qalora(self): | ||
| if self.adapter == "qalora": | ||
| if not self.load_in_4bit: | ||
| raise ValueError("QALoRA requires load_in_4bit to be True") | ||
| if not hasattr(self, "qalora_group_size") or self.qalora_group_size is None: | ||
| raise ValueError("QALoRA requires qalora_group_size to be specified") | ||
| if self.merge_lora: | ||
| raise ValueError("QALoRA does not support merge_lora yet") | ||
| return self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like duplicate of validation in src/axolotl/utils/schemas/peft.py.
src/axolotl/loaders/adapter.py
Outdated
| if hasattr(cfg, "use_qalora") and cfg.use_qalora: | ||
| lora_config_kwargs["use_qalora"] = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the adapter is qalora, we can just set this lora kwarg to true.
| if hasattr(cfg, "qalora_group_size") and cfg.qalora_group_size: | ||
| lora_config_kwargs["qalora_group_size"] = cfg.qalora_group_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to raise an Error here or possibly in validation if a user does not provide this?
Co-authored-by: NanoCode012 <[email protected]>
Co-authored-by: NanoCode012 <[email protected]>
Co-authored-by: NanoCode012 <[email protected]>
Co-authored-by: NanoCode012 <[email protected]>
Co-authored-by: NanoCode012 <[email protected]>
Co-authored-by: NanoCode012 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/axolotl/utils/schemas/validation.py (1)
1-160: Expand all QLoRA adapter checks to include QALoRAThe recent search shows several code paths that gate on “qlora” (or
["lora","qlora"]) without accounting for the new “qalora” adapter. Please update each of the following sites to also include"qalora":
src/axolotl/utils/schemas/validation.py (around line 592)
Current:if self.adapter in ["lora", "qlora"] and self.flash_attn_fuse_mlp: raise ValueError("Fused modules are not supported with LoRA/QLoRA")Change to:
- if self.adapter in ["lora", "qlora"] and self.flash_attn_fuse_mlp: + if self.adapter in ["lora", "qlora", "qalora"] and self.flash_attn_fuse_mlp:src/axolotl/utils/schemas/peft.py (line 154)
Current:if self.adapter == "qlora": if self.merge_lora: # …Change to:
- if self.adapter == "qlora": + if self.adapter in ["qlora", "qalora"]:src/axolotl/loaders/patch_manager.py (line 293)
Current:if ( … and self.cfg.adapter == "qlora"): from axolotl.monkeypatch.fsdp2_qlora import …Change to:
- if ( … and self.cfg.adapter == "qlora"): + if ( … and self.cfg.adapter in ["qlora", "qalora"]):src/axolotl/loaders/model.py (around line 558)
Current:elif self.cfg.adapter == "qlora" and self.model_kwargs.get("load_in_4bit", False): # …Change to:
- elif self.cfg.adapter == "qlora" and self.model_kwargs.get("load_in_4bit", False): + elif self.cfg.adapter in ["qlora", "qalora"] and self.model_kwargs.get("load_in_4bit", False):src/axolotl/core/builders/causal.py (line 164)
Current:if self.cfg.adapter == "qlora": training_arguments_kwargs["qlora"] = TrueChange to:
- if self.cfg.adapter == "qlora": + if self.cfg.adapter in ["qlora", "qalora"]: training_arguments_kwargs["qlora"] = TrueNo standalone
"QLoRA"messages were found, so documentation or log texts appear up to date. After applying these diffs, all code paths will correctly recognize QALoRA alongside LoRA/QLoRA.
🧹 Nitpick comments (11)
src/axolotl/utils/schemas/config.py (2)
1071-1119: Outdated comment; QALoRA is now included in auto-enable logicThe inline comment still says “Only proceed if using LoRA or QLoRA adapter,” but the condition now includes "qalora". Please update the comment for accuracy.
Apply this diff:
- # Only proceed if using LoRA or QLoRA adapter + # Only proceed if using LoRA, QLoRA, or QALoRA adapter
1076-1119: Confirm LoRA kernel compatibility for QALoRA and example config interplayQALoRA was added to the auto-enable path for LoRA kernels. Two quick points:
- Compatibility: These kernels target standard LoRA autograd paths. If QALoRA uses the same injection points, we’re good; otherwise, auto-enabling could be risky. Please confirm coverage in integration tests.
- Example config: The provided qalora.yml uses lora_dropout: 0.05, which prevents auto-enable by design. If the intent is to showcase auto-enable, set lora_dropout to 0 in the example.
Would you like me to add a minimal test that exercises QALoRA with kernels enabled and disabled to ensure parity?
examples/llama-2/qalora.yml (4)
15-18: Trim unused/empty fields for claritylora_model_dir is empty; consider removing it to avoid confusion in example configs.
adapter: qalora -lora_model_dir:
23-24: Group size present; consider documenting recommended valuesqalora_group_size: 16 is specified as required by validation. If there are preferred or hardware-aligned values (e.g., multiples of 8 or 16), consider adding a brief comment in the example to guide users.
25-29: Set lora_dropout to 0 to enable auto LoRA kernels by defaultAuto-enabling LoRA kernels only triggers when lora_dropout == 0. If you want the example to benefit from kernel optimizations out of the box, set dropout to 0 (or explicitly enable the kernels).
Option A (simplest; leverages auto-enable):
-lora_dropout: 0.05 +lora_dropout: 0Option B (explicit; works regardless of auto-enable):
lora_target_linear: true +lora_mlp_kernel: true +lora_qkv_kernel: true +lora_o_kernel: true
40-43: GC + packing is fine; call out Zero3/reentrant caveat if users add DeepSpeedThis config doesn’t set DeepSpeed Zero3, so you’re fine. If users add Zero3 later, the validators will warn about reentrant checkpointing interactions with QLoRA/QALoRA. Consider a short comment in the example to save users a round trip.
src/axolotl/utils/schemas/validation.py (5)
577-582: De-duplicate fused modules check and unify messageYou added a new fused-modules guard that includes QALoRA, but a second guard below (Lines 592-594) still checks LoRA/QLoRA only. This is redundant and inconsistent. Recommend keeping a single check that covers all adapters.
Apply this diff to remove the duplicate later check:
def check_fused_lora(self): - if self.adapter in ["lora", "qlora", "qalora"] and ( - self.flash_attn_fuse_qkv or self.flash_attn_fuse_mlp - ): - raise ValueError("Fused modules are not supported with LoRA/QLoRA/QALoRA") - return self + if self.adapter in ["lora", "qlora", "qalora"] and ( + self.flash_attn_fuse_qkv or self.flash_attn_fuse_mlp + ): + raise ValueError("Fused modules are not supported with LoRA/QLoRA/QALoRA") + return selfAnd delete Lines 592-594:
- if self.adapter in ["lora", "qlora"] and self.flash_attn_fuse_mlp: - raise ValueError("Fused modules are not supported with LoRA/QLoRA")
583-595: Strengthen QALoRA validation and improve error textGood to see explicit QALoRA checks. Two small improvements:
- Type/validity: Ensure qalora_group_size is a positive int to fail fast on typos.
- Messaging: Make the merge error more actionable per PR discussion (merging not supported for quantized models).
def validate_qalora(self): if self.adapter == "qalora": if not self.load_in_4bit: raise ValueError("QALoRA requires load_in_4bit to be True") - if not hasattr(self, "qalora_group_size") or self.qalora_group_size is None: - raise ValueError("QALoRA requires qalora_group_size to be specified") + if ( + not hasattr(self, "qalora_group_size") + or self.qalora_group_size is None + or not isinstance(self.qalora_group_size, int) + or self.qalora_group_size <= 0 + ): + raise ValueError("QALoRA requires a positive integer qalora_group_size") if self.merge_lora: - raise ValueError("QALoRA does not support merge_lora yet") + raise ValueError( + "QALoRA does not support merging adapters into quantized (4-bit) models. " + "Disable merge_lora or switch to a non-quantized merge workflow." + ) if self.adapter in ["lora", "qlora"] and self.flash_attn_fuse_mlp: raise ValueError("Fused modules are not supported with LoRA/QLoRA") return self
600-614: Update warning text to include QALoRA; consider robust Zero3 detectionYou included QALoRA in the condition, but the warning string mentions only qlora. Also, checking
"zero3" in data.get("deepspeed", "")relies on the path string; if it’s a JSON file path without “zero3” in the name, this check won’t catch it.Minimal text fix:
- "qlora + zero3 with use_reentrant: false may result in a CheckpointError about recomputed values" + "qlora/qalora + zero3 with use_reentrant: false may result in a CheckpointError about recomputed values"Optional robustness (pseudo-change): if deepspeed is a file path, parse JSON and check zero_optimization stage or a dedicated zero3 flag before warning.
700-715: Error message should mention QALoRA as wellThe condition includes QALoRA, but the message references QLoRA only. Suggest updating for consistency.
- "is not supported for distributed RL training with QLoRA. Please set " + "is not supported for distributed RL training with QLoRA/QALoRA. Please set "
513-595: Reduce adapter list duplication across validators"lora", "qlora", "qalora" appear in multiple conditions. Consider centralizing LoRA-like adapters in a single constant to prevent drift (e.g., LORAISH_ADAPTERS = ("lora", "qlora", "qalora")) and reusing it across checks.
Also applies to: 596-661, 670-716, 1174-1223
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
examples/llama-2/qalora.yml(1 hunks)src/axolotl/core/builders/causal.py(1 hunks)src/axolotl/loaders/adapter.py(2 hunks)src/axolotl/loaders/model.py(5 hunks)src/axolotl/utils/schemas/config.py(1 hunks)src/axolotl/utils/schemas/peft.py(3 hunks)src/axolotl/utils/schemas/validation.py(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- src/axolotl/core/builders/causal.py
- src/axolotl/loaders/adapter.py
- src/axolotl/loaders/model.py
- src/axolotl/utils/schemas/peft.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/axolotl/utils/schemas/validation.py (1)
src/axolotl/cli/main.py (1)
merge_lora(281-292)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
🔇 Additional comments (2)
examples/llama-2/qalora.yml (1)
5-7: Correct low-bit settings for QALoRAQALoRA requires load_in_4bit: true. This looks correct, thanks.
src/axolotl/utils/schemas/validation.py (1)
1204-1207: ReLoRA adapter set updated to include QALoRA — LGTMThe adapter guard for ReLoRA now includes "qalora"; message updated accordingly. Looks good.
Could you share where this is happening in peft? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
src/axolotl/utils/schemas/peft.py (2)
7-8: Good: type-safe adapter import.Importing AdapterEnum enables validation at the type level and addresses earlier feedback about constraining adapter values.
43-48: Good: adapter typed as AdapterEnum | None.This removes the risk of invalid strings and simplifies downstream checks. Description text is fine.
🧹 Nitpick comments (3)
src/axolotl/utils/schemas/peft.py (3)
133-136: Validate qalora_group_size (positive integer; only meaningful with adapter=qalora).Add basic validation to prevent misconfiguration and clarify scope of this field.
Apply this (outside the shown hunk) to enforce positivity:
@@ class LoraConfig(BaseModel): qalora_group_size: int | None = Field( default=None, json_schema_extra={"description": "Group size for QALoRA quantization pooling"}, ) + @field_validator("qalora_group_size") + @classmethod + def _validate_qalora_group_size(cls, v): + if v is None: + return v + if isinstance(v, bool) or not isinstance(v, int) or v <= 0: + raise ValueError("qalora_group_size must be a positive integer") + return vAnd (see my later comment on lines 178-199) also gate this field to only be set when adapter == qalora.
140-152: Fix user-facing error message spacing and grammar in validator.The concatenated strings render as “training.If…”. Also “full finetune” reads awkwardly.
Apply:
- raise ValueError( - "load_in_8bit and load_in_4bit are not supported without setting an adapter for training." - "If you want to full finetune, please turn off load_in_8bit and load_in_4bit." - ) + raise ValueError( + "load_in_8bit and load_in_4bit are not supported without setting an adapter for training. " + "If you want to fully fine-tune, please turn off load_in_8bit and load_in_4bit." + )
156-177: Prefer Enum member comparison over string literals.Now that adapter is AdapterEnum, compare against AdapterEnum.qlora for clarity and static tooling.
Apply:
- if self.adapter == "qlora": + if self.adapter == AdapterEnum.qlora:
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
src/axolotl/loaders/adapter.py(2 hunks)src/axolotl/utils/schemas/enums.py(1 hunks)src/axolotl/utils/schemas/peft.py(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/axolotl/loaders/adapter.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/axolotl/utils/schemas/peft.py (1)
src/axolotl/utils/schemas/enums.py (1)
AdapterEnum(22-27)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: PyTest (3.11, 2.7.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
- GitHub Check: preview
🔇 Additional comments (1)
src/axolotl/utils/schemas/enums.py (1)
22-28: AdapterEnum addition looks good and aligns with downstream usage.Using a str-backed Enum is appropriate here for JSON/schema friendliness.
| if self.adapter == "qalora": | ||
| if self.merge_lora: | ||
| # can't merge qalora if loaded in 8bit or 4bit | ||
| if self.load_in_8bit: | ||
| raise ValueError("Can't merge qalora if loaded in 8bit") | ||
|
|
||
| if self.gptq: | ||
| raise ValueError("Can't merge qalora if gptq") | ||
|
|
||
| if self.load_in_4bit: | ||
| raise ValueError("Can't merge qalora if loaded in 4bit") | ||
|
|
||
| else: | ||
| if self.load_in_8bit: | ||
| raise ValueError("Can't load qalora in 8bit") | ||
|
|
||
| if self.gptq: | ||
| raise ValueError("Can't load qalora if gptq") | ||
|
|
||
| if not self.load_in_4bit: | ||
| raise ValueError("Require cfg.load_in_4bit to be True for qalora") | ||
| return self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Mirror Enum comparison for qalora; add guard for stray qalora_group_size; consider clearer GPTQ limitation text.
- Use AdapterEnum for comparisons.
- Prevent configs that set qalora_group_size while adapter != qalora.
- Optional: Expand GPTQ error text to explicitly state “not supported by this PR/PEFT parity” to match the PR objectives and avoid confusion.
Apply:
- if self.adapter == "qalora":
+ if self.adapter == AdapterEnum.qalora:
if self.merge_lora:
# can't merge qalora if loaded in 8bit or 4bit
if self.load_in_8bit:
raise ValueError("Can't merge qalora if loaded in 8bit")
if self.gptq:
- raise ValueError("Can't merge qalora if gptq")
+ raise ValueError(
+ "Can't merge qalora if gptq (adapter merging into GPTQ is not supported in this PR; PEFT also does not support it)"
+ )
if self.load_in_4bit:
raise ValueError("Can't merge qalora if loaded in 4bit")
else:
if self.load_in_8bit:
raise ValueError("Can't load qalora in 8bit")
if self.gptq:
- raise ValueError("Can't load qalora if gptq")
+ raise ValueError("Can't load qalora if gptq")
if not self.load_in_4bit:
raise ValueError("Require cfg.load_in_4bit to be True for qalora")
+ # Disallow stray qalora_group_size when adapter is not qalora
+ if self.adapter != AdapterEnum.qalora and self.qalora_group_size is not None:
+ raise ValueError("qalora_group_size is only valid when adapter == 'qalora'")To ensure consistency repo-wide (no lingering string literal checks; field is only used with qalora), run:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "String literal adapter comparisons (should prefer AdapterEnum.*):"
rg -nP -C2 '\badapter\s*==\s*"(?:lora|qlora|qalora)"' --type=py || true
echo
echo "Usages of qalora_group_size:"
rg -n --type=py -C2 '\bqalora_group_size\b' || trueLength of output: 7018
Update QALoRA schema validation: use AdapterEnum and guard stray qalora_group_size
The validation logic in src/axolotl/utils/schemas/peft.py should be tightened and made consistent:
- In the
validate_qalora(lines 178–199):- Replace string-literal checks with the enum comparison:
- if self.adapter == "qalora": + if self.adapter == AdapterEnum.qalora:
- Add a guard to prevent a non-QALoRA adapter from carrying a
qalora_group_size:# Disallow stray qalora_group_size when adapter is not qalora if self.adapter != AdapterEnum.qalora and self.qalora_group_size is not None: raise ValueError( "qalora_group_size is only valid when adapter == AdapterEnum.qalora" )
- Replace string-literal checks with the enum comparison:
- Optional: Clarify the GPTQ limitation message in the merge branch to reflect “not supported by this PR/PEFT parity.”
Note that string-literal comparisons for adapters are still widespread across tests and other modules (e.g. tests/test_loaders.py, src/axolotl/loaders/adapter.py, etc.). We recommend migrating all of these to AdapterEnum in a follow-up refactor for consistency.
🤖 Prompt for AI Agents
In src/axolotl/utils/schemas/peft.py around lines 178–199, update
validate_qalora to use AdapterEnum instead of string literals when checking for
qalora, and add a guard that raises a ValueError if self.adapter !=
AdapterEnum.qalora and self.qalora_group_size is not None (message:
"qalora_group_size is only valid when adapter == AdapterEnum.qalora"). Also
change the existing qalora-related checks to compare against AdapterEnum.qalora,
and adjust the GPTQ/merge error message in the merge branch to indicate GPTQ is
not supported by this PR/PEFT parity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/axolotl/loaders/model.py (1)
556-591: Fix QLoRA/QALoRA Branch Ordering to Preserve Stable BnB DefaultsCurrently in
src/axolotl/loaders/model.py(≈ lines 556–591), the generic 4-bit branch for both QLoRA and QALoRA fires before the QLoRA-specific defaults, so:
- QLoRA paths never reach their well-tuned
bnb_config(they’re shadowed by the generic branch).- QALoRA paths use whatever is in
model_config.quantization_config, which may include unsupported or unexpected keys.To restore stable defaults and share them with QALoRA, remove the generic branch and merge QALoRA into the QLoRA branch:
• File:
src/axolotl/loaders/model.py
– Remove the firstelif self.cfg.adapter in ["qlora", "qalora"] and load_in_4bitblock (lines ~560–565).
– Change the QLoRA-only condition to include QALoRA.Suggested minimal diff:
--- a/src/axolotl/loaders/model.py +++ b/src/axolotl/loaders/model.py @@ -560,16 +560,11 @@ def _set_quantization_config(self): - elif ( - self.cfg.adapter in ["qlora", "qalora"] - and self.model_kwargs["load_in_4bit"] - ): - quantization_config = getattr(self.model_config, "quantization_config", {}) - self.model_kwargs["quantization_config"] = BitsAndBytesConfig( - **quantization_config - ) - elif self.cfg.adapter == "qlora" and self.model_kwargs.get( + elif self.cfg.adapter in ["qlora", "qalora"] and self.model_kwargs.get( "load_in_4bit", False ): bnb_config = { "load_in_4bit": True, "llm_int8_threshold": 6.0, "llm_int8_has_fp16_weight": False,Benefits:
- QLoRA uses its tested NF4 + double-quant defaults.
- QALoRA inherits those same stable settings.
- Eliminates risk of passing arbitrary keys into
BitsAndBytesConfig.Please apply this change before merging.
🧹 Nitpick comments (1)
src/axolotl/utils/schemas/config.py (1)
1078-1125: Auto-enable LoRA kernels for QALoRA: logic looks right; update comment and validate behaviorThe inclusion of "qalora" in the adapter check is correct and consistent with the rest of the PR. Two small asks:
- Update the comment “Only proceed if using LoRA or QLoRA adapter” to include QALoRA.
- Please sanity-check that the LoRA Triton kernels work with PEFT’s QALoRA path (grad shapes, fused ops), especially when lora_dropout != 0 and on multi-GPU + FSDP2 where auto-enabling is permitted.
Suggested tiny comment tweak:
- # Only proceed if using LoRA or QLoRA adapter + # Only proceed if using LoRA / QLoRA / QALoRA adapter
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
src/axolotl/loaders/model.py(5 hunks)src/axolotl/utils/schemas/config.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/axolotl/loaders/model.py (1)
src/axolotl/integrations/base.py (2)
cfg(339-340)cfg(343-344)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: PyTest (3.11, 2.7.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: preview
🔇 Additional comments (4)
src/axolotl/loaders/model.py (4)
158-160: Include QALoRA in FSDP gating: LGTMUpdating the FSDP+PEFT property to consider QALoRA is consistent with the rest of the integration.
321-327: Enable grad checkpointing + FA2 dtype path for QALoRA: LGTMTreating QALoRA like LoRA/QLoRA here makes sense. This will keep FA2 compatibility after k-bit prep. No issues spotted.
536-555: Quantization-config passthrough now includes QALoRA: LGTM with a sanity noteIncluding QALoRA alongside LoRA/QLoRA for GPTQ/AWQ/BnB model_config.quantization_config passthrough is correct. Note: ensure model_config.quantization_config keys match the target config class (e.g., BitsAndBytesConfig) for this branch—your current quant_method gating prevents most mismatches.
869-876: prepare_model_for_kbit_training is required for QA-LoRA with 4/8-bit models
The PEFT and QLoRA examples/docs explicitly show that you must runprepare_model_for_kbit_training(or perform the equivalent fp32 casting, layernorm adjustments, gradient‐checkpointing setup, etc.) before wrapping the model with QA-LoRA. Since PEFT/QA-LoRA does not prepare the base model internally, the existing call here is both necessary and correct—no further changes are needed.
| flash_attn_fuse_qkv: bool | None = Field( | ||
| default=None, | ||
| json_schema_extra={ | ||
| "description": "Whether to fuse QKV projection into a single operation" | ||
| }, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
flash_attn_fuse_qkv is defined but appears unused; add wiring + guards
Good addition, but I don’t see this flag being consumed anywhere (e.g., in PatchManager or _set_attention_config). Also consider guarding for:
- flash-attn presence and minimum version that supports QKV fusion,
- compatible GPU arch (Ampere+), and
- interaction with attn_implementation (flash_attention_2).
Follow-ups:
- Wire this flag where the FA2 kernels are configured.
- Add a validation/model_validator to warn/disable when unsupported.
Run to confirm usage across the repo:
🏁 Script executed:
#!/bin/bash
rg -n --hidden -g '!**/dist/**' -C2 '\bflash_attn_fuse_qkv\b'Length of output: 855
Wire flash_attn_fuse_qkv into the attention setup and tighten guards
Right now this flag is only declared in config.py (line 544) and checked in LoRA validation (in validation.py around line 578), but never actually passed into the code paths that select and configure FlashAttention kernels. To complete this feature, please:
• In your attention‐kernel configuration (e.g. in PatchManager or the _set_attention_config routine where attn_implementation is chosen), consume flash_attn_fuse_qkv. When it’s true, enable QKV fusion in FlashAttention-2.
• Add runtime guards immediately around that wiring to verify:
– The FlashAttention library is installed and at or above the minimum version that supports QKV fusion.
– The GPU’s compute capability is Ampere (8.0+) or newer.
– If attn_implementation is set to something else, ensure it’s overridden or disabled when flash_attn_fuse_qkv=True.
• Extend your model‐validation logic (in validation.py or a new model_validator) to error‐out or emit a warning when flash_attn_fuse_qkv=True but the environment doesn’t meet the above requirements.
By wiring the flag where the FA2 kernels are configured and codifying these guards, users will have both compile-time schema safety and correct runtime behavior.
|
| if self.adapter in ["lora", "qlora", "qalora"] and ( | ||
| self.flash_attn_fuse_qkv or self.flash_attn_fuse_mlp | ||
| ): | ||
| raise ValueError("Fused modules are not supported with LoRA/QLoRA/QALoRA") | ||
| return self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a merge conflict as we removed flash_attn_fuse_qkv recently.
Description
qa-lora integration
Motivation and Context
#2878
How has this been tested?
run the
qalora.ymlfile presentTypes of changes
added 2 qalora specific parameters
qalora: boolandqalora_group_size: Optional[int]also added qalora support in adapters loading and all
inspiration peft qalora
Summary by CodeRabbit
New Features
Validation
Tests