Skip to content

P18: QLoRA quantization pipeline (Issue #333)#522

Open
NSR9 wants to merge 20 commits intostagingfrom
p18/feat/qlora-quantization-approach-333
Open

P18: QLoRA quantization pipeline (Issue #333)#522
NSR9 wants to merge 20 commits intostagingfrom
p18/feat/qlora-quantization-approach-333

Conversation

@NSR9
Copy link
Collaborator

@NSR9 NSR9 commented Feb 21, 2026

Summary

Context

Recovered branch rebased onto staging. PR #521 merges IDFT smoke testing into this branch first, then this PR merges everything into staging.

Test plan

🤖 Generated with Claude Code

jha-vikas and others added 13 commits February 21, 2026 01:12
Add comprehensive QLoRA quantization support for SFT and RL alignment training.

This commit adds:
- QLoRA quantization approach document covering formats, layer strategies, and hardware considerations
- Type-safe configuration system with YAML and CLI support
- Training scripts supporting SFT, GRPO, and DPO methods
- End-to-end validation script to verify quantization support
- Default configuration with hardware auto-detection
- Complete requirements and documentation

Addresses Issue #333: quantization formats are guaranteed supported end-to-end.

Files:
- quantization_support/QLORA_QUANTIZATION_APPROACH.md
- quantization_support/qlora_config.py
- quantization_support/train_qlora.py
- quantization_support/validate_quantization.py
- quantization_support/default_config.yaml
- quantization_support/README.md
- quantization_support/requirements.txt
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
NSR9 and others added 2 commits February 21, 2026 01:30
@jha-vikas jha-vikas requested a review from a team as a code owner February 23, 2026 14:16
…oRA model-agnostic

Deep technical review of the quantization_support codebase identified
15 issues (1 critical, 3 high, 6 medium, 5 low). All have been fixed.
A TECHNICAL_ANALYSIS.md document has been added with full architecture
diagrams, per-component breakdowns, and the issues catalog.

CRITICAL:
- idft_loss.py: Add .detach() to gamma so gradients do not flow through
  the IDFT reweighting factor. Without this, the optimization objective
  differs from arXiv:2602.12222. SFT/GRPO/DPO paths are unaffected.

HIGH:
- qlora_config.py: Wire exclude_modules into BitsAndBytesConfig via
  llm_int8_skip_modules for both 4-bit and 8-bit modes. Replace regex
  patterns with plain module names covering Llama, Mistral, Qwen, phi,
  GPT-2, Falcon, and BLOOM architectures.
- qlora_config.py + default_config.yaml + idft_smoke_config.yaml:
  Change LoRA target_modules default from a hardcoded Llama-style list
  to "all-linear", which uses PEFT's auto-detection of all nn.Linear
  layers. This makes LoRA fully model-agnostic out of the box.
- run_idft_smoke_test.py: Add GPU memory cleanup (del trainer,
  gc.collect, torch.cuda.empty_cache) in finally blocks between
  sequential Phase 2 training runs to prevent OOM.

MEDIUM:
- qlora_config.py from_dict(): Copy nested dicts before .pop() to
  avoid mutating the caller's dictionary.
- qlora_config.py from_args(): Change truthiness checks to
  `is not None` so falsy CLI values like --lora_r 0 pass through.
- evaluate_smoke_test.py: Add _extract_score_from_raw() helper with
  METRIC_KEYS covering acc, acc_norm, exact_match, pass@1, em, score.
  Fix print_results_table() to actually display benchmark scores.
- idft_trainer.py: Replace internal _idft_step_count with
  self.state.global_step so diagnostic logging frequency matches real
  training steps regardless of gradient accumulation.
- validate_quantization.py: check_model_loading() now inspects
  embedding/norm/lm_head parameters for quant_state to verify they
  were not accidentally quantized.

LOW:
- phi_diagnostic.py + run_idft_smoke_test.py: Add padding_side="left"
  to tokenizer for correct causal LM batching.
- validate_quantization.py: Replace fragile two-parser config loading
  with direct QLoRAConfig.from_yaml().
- train_qlora.py: Emit one-time warning when text column is empty in
  format_sft_dataset().
- requirements.txt: Bump transformers>=4.38.0, peft>=0.8.0, trl>=0.8.0
  to match actual API usage.

New file:
- TECHNICAL_ANALYSIS.md: Comprehensive technical document covering
  system architecture (with mermaid diagrams), per-component breakdown,
  IDFT loss mathematical deep-dive, and the full issues catalog with
  resolution status.

Breaking changes:
- IDFT training results are not reproducible against prior runs (the
  old gradient behavior was incorrect per the paper).
- LoRA now targets all linear layers by default instead of 7 hardcoded
  module names; this increases trainable params but works with any model.
  Users can override via YAML or --lora_target_modules CLI flag.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants