WoRA integration into PEFT #2872

sambhavnoobcoder · 2025-10-26T20:45:19Z

WoRA (Weighted-Direction Low-Rank Adaptation) Implementation for PEFT

Summary

This pull request adds support for WoRA (Weighted-Direction Low-Rank Adaptation), a novel extension of DoRA that introduces learnable scalar parameters (alpha and beta) to create a weighted combination of the base weights and LoRA adapters. WoRA provides more fine-grained control over the adaptation process compared to standard LoRA and DoRA.

Fixes : #2861

Analysis and Understanding

WoRA Formula

WoRA extends DoRA by introducing two learnable scalar parameters:

WoRA_output = m * (β * W₀ + α * scaling * BA) / ||β * W₀ + α * scaling * BA||

Where:

m is the learned magnitude vector (from DoRA)
W₀ is the base weight matrix
BA is the LoRA decomposition (B × A)
α (alpha) controls the LoRA contribution
β (beta) controls the base weight contribution
scaling is the LoRA scaling factor

Key Insights

LoraVariant Pattern: The existing DoRA implementation uses a clean separation between:
- Layer classes (in wora.py) that handle forward computation
- Variant classes (in variants.py) that handle initialization and variant-specific logic
- This pattern was extended for WoRA
Parameter Naming Convention: PEFT automatically marks parameters as trainable if their names contain "lora_". This is why we use lora_wora_alpha and lora_wora_beta
ParameterDict Storage: Using nn.ParameterDict ensures parameters are:
- Automatically registered with the module
- Properly handled during state dict operations
- Accessible through the standard PyTorch parameter iteration
Layer-Specific Challenges:
- Linear layers: Straightforward implementation following DoRA pattern
- Embedding layers: Require matrix transposition and special handling for embed_scale
- Conv layers: Need proper reshaping and dimension handling for weight norms

Implementation Approach

1. Core Architecture (wora.py)

Created four main layer classes:

WoraLinearLayer: Base implementation for linear transformations
WoraEmbeddingLayer: Handles token embeddings with proper matrix transposition
_WoraConvNdLayer: Base class for convolutional layers
WoraConv1dLayer, WoraConv2dLayer, WoraConv3dLayer: Specialized conv layers

Key Design Decisions:

Alpha and beta parameters are stored separately from the layer classes (in the main LoRA layer)
Layer classes receive alpha/beta as parameters to maintain gradient flow
Weight norm calculation uses scalar values (.item()) to avoid affecting the norm computation
The actual forward computation uses the Parameter tensors directly for gradient tracking

2. Variant Classes (variants.py)

Implemented five variant classes following PEFT's LoraVariant pattern:

WoraLinearVariant
WoraEmbeddingVariant
WoraConv1dVariant, WoraConv2dVariant, WoraConv3dVariant

Each variant handles:

init(): Creating and initializing WoRA-specific parameters
forward(): Calling the appropriate layer forward method
merge_safe/merge_unsafe(): Merging adapters with base weights
unmerge(): Restoring original weights

3. Parameter Initialization (layer.py)

Modified three key methods to initialize WoRA parameters:

LoraLayer.update_layer(): Base implementation for Linear layers
Embedding.update_layer(): Special handling for embedding layers
_ConvNd.update_layer(): Handling for convolutional layers

Initialization Pattern:

if use_wora:
    self.lora_wora_alpha[adapter_name] = nn.Parameter(torch.tensor(1.0), requires_grad=True)
    self.lora_wora_beta[adapter_name] = nn.Parameter(torch.tensor(1.0), requires_grad=True)

4. Configuration (config.py)

Added use_wora boolean flag to LoraConfig with proper validation:

Mutually exclusive with other LoRA variants (dora, rs_lora, pissa, etc.)
Defaults to False for backward compatibility
Triggers WoRA initialization when set to True

5. Testing (test_lora_variants.py)

Added comprehensive tests:

test_variant_is_applied_to_layers: Verifies WoRA variants are correctly applied to all layer types
test_wora_params_have_gradients: Ensures alpha and beta parameters receive gradients during backpropagation

Key Technical Challenges and Solutions

Challenge 1: Gradient Flow for Alpha and Beta

Problem: Initial implementation used .item() to convert Parameters to scalars throughout the computation, breaking gradient flow.

Solution:

Extract scalar values ONLY for weight norm calculation (which is detached anyway)
Use the Parameter tensors directly in the forward computation
This ensures gradients flow back to alpha and beta

Challenge 2: Embedding Layer Matrix Dimensions

Problem: Embedding layers store lora_embedding_A and lora_embedding_B with shapes that need transposition before use.

Solution:

Transpose matrices in the variant's forward method: lora_embedding_A.T and lora_embedding_B.T
Pass the base_result to the layer for proper weighted combination
Handle special embed_scale for certain embedding types (e.g., Gemma3)

Challenge 3: Parameter Initialization in Override Methods

Problem: Embedding and _ConvNd classes override update_layer() without calling super(), so they missed WoRA parameter initialization.

Solution:

Duplicated WoRA parameter initialization in both override methods
Ensured identical initialization logic across all layer types
Added explicit requires_grad_(True) to ensure trainability

Challenge 4: Conv Layer Forward Pass

Problem: Convolutional layers have more complex forward logic with bias handling and reshaping requirements.

Solution:

Adapted the weight norm calculation for higher-dimensional tensors
Properly handled base_result computation with stride, padding, dilation
Ensured mag_norm_scale broadcasting works correctly with conv outputs

Verification and Testing

Test Coverage

The implementation includes two parametrized tests that cover:

Variant Application Test: Verifies that:
- WoRA variants are correctly instantiated for all layer types
- The variant types match the expected classes (WoraLinearVariant, WoraEmbeddingVariant, etc.)
- The variant system properly handles unsupported layer types
Gradient Flow Test: Verifies that:
- All WoRA parameters (alpha, beta, magnitude vector) receive gradients during backpropagation
- Gradients are non-None for all layer types (Linear, Embedding, Conv1d, Conv2d)
- The parameters actively participate in the forward computation

Test Results

All tests pass successfully:

Files Modified

src/peft/tuners/lora/config.py: Added use_wora configuration parameter
src/peft/tuners/lora/layer.py: Added WoRA parameter initialization in update_layer methods
src/peft/tuners/lora/wora.py: Implemented WoRA layer classes
src/peft/tuners/lora/variants.py: Implemented WoRA variant classes
tests/test_lora_variants.py: Added comprehensive WoRA tests

Backward Compatibility

This implementation maintains full backward compatibility:

Default behavior unchanged (use_wora defaults to False)
No modifications to existing LoRA/DoRA code paths
All existing tests continue to pass
New parameters only created when use_wora=True

cc: @BenjaminBossan

…pha/beta parameters

…ants

…liance

…adient flow

…fix matching

- Add WoRA to test variant map in test_lora_variants.py - Add test case for WoRA variant application to all layer types - Add test for WoRA alpha/beta parameter gradients - Fix WoRA parameter initialization in Embedding.update_layer - Fix WoRA parameter initialization in _ConvNd.update_layer - Fix WoraEmbeddingLayer to include alpha in computation - Fix WoraConvNdLayer gradient flow for alpha/beta parameters - Transpose embedding matrices in WoraEmbeddingVariant.forward - Add embed_scale support in WoraEmbeddingVariant All WoRA tests now pass successfully.

sambhavnoobcoder added 12 commits October 26, 2025 23:50

Add WoRA core implementation with weighted direction and learnable al…

ae503aa

…pha/beta parameters

Update LoraLayer to support WoRA with alpha/beta parameter storage

e42daab

Update Linear, Embedding, and Conv layer classes to support WoRA vari…

dca53dd

…ants

Add use_wora parameter propagation in LoraModel layer creation

e8733fb

Fix parameter ordering in update_layer methods for Python syntax comp…

a11c338

…liance

Fix WoRA parameter registration and config validation for complete gr…

cc8fbc0

…adient flow

Enable gradient flow through WoRA alpha and beta parameters

633a850

Set requires_grad=True for WoRA alpha and beta parameters

209752e

Move wora_alpha and wora_beta to adapter_layer_names for trainability

7d9725b

Rename wora_alpha/wora_beta to lora_wora_alpha/lora_wora_beta for pre…

01b1860

…fix matching

Add WoRA tests to test_lora_variants.py

9f2ec48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WoRA integration into PEFT #2872

WoRA integration into PEFT #2872

sambhavnoobcoder commented Oct 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

WoRA integration into PEFT #2872

Are you sure you want to change the base?

WoRA integration into PEFT #2872

Conversation

sambhavnoobcoder commented Oct 26, 2025

WoRA (Weighted-Direction Low-Rank Adaptation) Implementation for PEFT

Summary

Analysis and Understanding

WoRA Formula

Key Insights

Implementation Approach

1. Core Architecture (wora.py)

2. Variant Classes (variants.py)

3. Parameter Initialization (layer.py)

4. Configuration (config.py)

5. Testing (test_lora_variants.py)

Key Technical Challenges and Solutions

Challenge 1: Gradient Flow for Alpha and Beta

Challenge 2: Embedding Layer Matrix Dimensions

Challenge 3: Parameter Initialization in Override Methods

Challenge 4: Conv Layer Forward Pass

Verification and Testing

Test Coverage

Test Results

Files Modified

Backward Compatibility

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant