P18: QLoRA quantization pipeline (Issue #333) by NSR9 · Pull Request #522 · The-School-of-AI/LLM

NSR9 · 2026-02-21T09:14:55Z

Summary

Add QLoRA quantization support pipeline for Team 18
Base branch for IDFT smoke testing work (see PR P18: IDFT smoke testing into QLoRA quantization branch #521)

Context

Recovered branch rebased onto staging. PR #521 merges IDFT smoke testing into this branch first, then this PR merges everything into staging.

Test plan

Merge PR P18: IDFT smoke testing into QLoRA quantization branch #521 first (IDFT smoke testing → this branch)
Verify QLoRA quantization pipeline functionality
Review combined changes before merging to staging

🤖 Generated with Claude Code

Add comprehensive QLoRA quantization support for SFT and RL alignment training. This commit adds: - QLoRA quantization approach document covering formats, layer strategies, and hardware considerations - Type-safe configuration system with YAML and CLI support - Training scripts supporting SFT, GRPO, and DPO methods - End-to-end validation script to verify quantization support - Default configuration with hardware auto-detection - Complete requirements and documentation Addresses Issue #333: quantization formats are guaranteed supported end-to-end. Files: - quantization_support/QLORA_QUANTIZATION_APPROACH.md - quantization_support/qlora_config.py - quantization_support/train_qlora.py - quantization_support/validate_quantization.py - quantization_support/default_config.yaml - quantization_support/README.md - quantization_support/requirements.txt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ting P18: IDFT smoke testing into QLoRA quantization branch

…oRA model-agnostic Deep technical review of the quantization_support codebase identified 15 issues (1 critical, 3 high, 6 medium, 5 low). All have been fixed. A TECHNICAL_ANALYSIS.md document has been added with full architecture diagrams, per-component breakdowns, and the issues catalog. CRITICAL: - idft_loss.py: Add .detach() to gamma so gradients do not flow through the IDFT reweighting factor. Without this, the optimization objective differs from arXiv:2602.12222. SFT/GRPO/DPO paths are unaffected. HIGH: - qlora_config.py: Wire exclude_modules into BitsAndBytesConfig via llm_int8_skip_modules for both 4-bit and 8-bit modes. Replace regex patterns with plain module names covering Llama, Mistral, Qwen, phi, GPT-2, Falcon, and BLOOM architectures. - qlora_config.py + default_config.yaml + idft_smoke_config.yaml: Change LoRA target_modules default from a hardcoded Llama-style list to "all-linear", which uses PEFT's auto-detection of all nn.Linear layers. This makes LoRA fully model-agnostic out of the box. - run_idft_smoke_test.py: Add GPU memory cleanup (del trainer, gc.collect, torch.cuda.empty_cache) in finally blocks between sequential Phase 2 training runs to prevent OOM. MEDIUM: - qlora_config.py from_dict(): Copy nested dicts before .pop() to avoid mutating the caller's dictionary. - qlora_config.py from_args(): Change truthiness checks to `is not None` so falsy CLI values like --lora_r 0 pass through. - evaluate_smoke_test.py: Add _extract_score_from_raw() helper with METRIC_KEYS covering acc, acc_norm, exact_match, pass@1, em, score. Fix print_results_table() to actually display benchmark scores. - idft_trainer.py: Replace internal _idft_step_count with self.state.global_step so diagnostic logging frequency matches real training steps regardless of gradient accumulation. - validate_quantization.py: check_model_loading() now inspects embedding/norm/lm_head parameters for quant_state to verify they were not accidentally quantized. LOW: - phi_diagnostic.py + run_idft_smoke_test.py: Add padding_side="left" to tokenizer for correct causal LM batching. - validate_quantization.py: Replace fragile two-parser config loading with direct QLoRAConfig.from_yaml(). - train_qlora.py: Emit one-time warning when text column is empty in format_sft_dataset(). - requirements.txt: Bump transformers>=4.38.0, peft>=0.8.0, trl>=0.8.0 to match actual API usage. New file: - TECHNICAL_ANALYSIS.md: Comprehensive technical document covering system architecture (with mermaid diagrams), per-component breakdown, IDFT loss mathematical deep-dive, and the full issues catalog with resolution status. Breaking changes: - IDFT training results are not reproducible against prior runs (the old gradient behavior was incorrect per the paper). - LoRA now targets all linear layers by default instead of 7 hardcoded module names; this increases trainable params but works with any model. Users can override via YAML or --lora_target_modules CLI flag.

jha-vikas and others added 13 commits February 21, 2026 01:12

test: add unit tests for IDFT loss function

16b7fda

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: implement IDFT and SFT loss functions

cb07e90

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add IDFTSettings to QLoRA config and CLI

f8b57af

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add IDFTTrainer subclass with IDFT loss

85bc794

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: integrate IDFT training method into train_qlora.py

379bd83

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add phi distribution diagnostic for DDT validation

33331b8

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add IDFT smoke test experiment config

87d810b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add benchmark evaluation script for smoke test

d62d7ac

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add IDFT smoke test orchestrator (all phases)

133747d

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs: add IDFT smoke test dependencies to requirements

f4f26f4

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

style: format IDFT smoke test files

c2c0241

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Added planning docs

a2dc7eb

NSR9 requested review from ShanmugaSuntharam, kalaiyarasan1988, pankaj1311, u6yuvi and wayoutisin as code owners February 21, 2026 09:14

NSR9 and others added 2 commits February 21, 2026 01:30

Merge pull request #521 from The-School-of-AI/p18/feat/idft-smoke-tes…

dcdc9de

…ting P18: IDFT smoke testing into QLoRA quantization branch

Updated to solve pre commit issues

7feac1a

jha-vikas requested a review from a team as a code owner February 23, 2026 14:16

jha-vikas added 2 commits February 25, 2026 11:50

Merge branch 'staging' into p18/feat/qlora-quantization-approach-333

d23e32c

jha-vikas requested review from aiplaybookin, firekind, sidrocks and smitasasindran as code owners March 4, 2026 11:25

jha-vikas added 3 commits March 4, 2026 11:37

Conflict resolution for uv.lock & pyproject.toml

feb9e35

Apply black formatting to qlora_config.py

f67b061

Merge branch 'staging' into p18/feat/qlora-quantization-approach-333

225337e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

P18: QLoRA quantization pipeline (Issue #333)#522

P18: QLoRA quantization pipeline (Issue #333)#522
NSR9 wants to merge 20 commits intostagingfrom
p18/feat/qlora-quantization-approach-333

NSR9 commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NSR9 commented Feb 21, 2026

Summary

Context

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants