Skip to content

Conversation

@sambhavnoobcoder
Copy link
Contributor

WoRA (Weighted-Direction Low-Rank Adaptation) Implementation for PEFT

Summary

This pull request adds support for WoRA (Weighted-Direction Low-Rank Adaptation), a novel extension of DoRA that introduces learnable scalar parameters (alpha and beta) to create a weighted combination of the base weights and LoRA adapters. WoRA provides more fine-grained control over the adaptation process compared to standard LoRA and DoRA.

Fixes : #2861

Analysis and Understanding

WoRA Formula

WoRA extends DoRA by introducing two learnable scalar parameters:

WoRA_output = m * (β * W₀ + α * scaling * BA) / ||β * W₀ + α * scaling * BA||

Where:

  • m is the learned magnitude vector (from DoRA)
  • W₀ is the base weight matrix
  • BA is the LoRA decomposition (B × A)
  • α (alpha) controls the LoRA contribution
  • β (beta) controls the base weight contribution
  • scaling is the LoRA scaling factor

Key Insights

  1. LoraVariant Pattern: The existing DoRA implementation uses a clean separation between:

    • Layer classes (in wora.py) that handle forward computation
    • Variant classes (in variants.py) that handle initialization and variant-specific logic
    • This pattern was extended for WoRA
  2. Parameter Naming Convention: PEFT automatically marks parameters as trainable if their names contain "lora_". This is why we use lora_wora_alpha and lora_wora_beta

  3. ParameterDict Storage: Using nn.ParameterDict ensures parameters are:

    • Automatically registered with the module
    • Properly handled during state dict operations
    • Accessible through the standard PyTorch parameter iteration
  4. Layer-Specific Challenges:

    • Linear layers: Straightforward implementation following DoRA pattern
    • Embedding layers: Require matrix transposition and special handling for embed_scale
    • Conv layers: Need proper reshaping and dimension handling for weight norms

Implementation Approach

1. Core Architecture (wora.py)

Created four main layer classes:

  • WoraLinearLayer: Base implementation for linear transformations
  • WoraEmbeddingLayer: Handles token embeddings with proper matrix transposition
  • _WoraConvNdLayer: Base class for convolutional layers
  • WoraConv1dLayer, WoraConv2dLayer, WoraConv3dLayer: Specialized conv layers

Key Design Decisions:

  • Alpha and beta parameters are stored separately from the layer classes (in the main LoRA layer)
  • Layer classes receive alpha/beta as parameters to maintain gradient flow
  • Weight norm calculation uses scalar values (.item()) to avoid affecting the norm computation
  • The actual forward computation uses the Parameter tensors directly for gradient tracking

2. Variant Classes (variants.py)

Implemented five variant classes following PEFT's LoraVariant pattern:

  • WoraLinearVariant
  • WoraEmbeddingVariant
  • WoraConv1dVariant, WoraConv2dVariant, WoraConv3dVariant

Each variant handles:

  • init(): Creating and initializing WoRA-specific parameters
  • forward(): Calling the appropriate layer forward method
  • merge_safe/merge_unsafe(): Merging adapters with base weights
  • unmerge(): Restoring original weights

3. Parameter Initialization (layer.py)

Modified three key methods to initialize WoRA parameters:

  • LoraLayer.update_layer(): Base implementation for Linear layers
  • Embedding.update_layer(): Special handling for embedding layers
  • _ConvNd.update_layer(): Handling for convolutional layers

Initialization Pattern:

if use_wora:
    self.lora_wora_alpha[adapter_name] = nn.Parameter(torch.tensor(1.0), requires_grad=True)
    self.lora_wora_beta[adapter_name] = nn.Parameter(torch.tensor(1.0), requires_grad=True)

4. Configuration (config.py)

Added use_wora boolean flag to LoraConfig with proper validation:

  • Mutually exclusive with other LoRA variants (dora, rs_lora, pissa, etc.)
  • Defaults to False for backward compatibility
  • Triggers WoRA initialization when set to True

5. Testing (test_lora_variants.py)

Added comprehensive tests:

  • test_variant_is_applied_to_layers: Verifies WoRA variants are correctly applied to all layer types
  • test_wora_params_have_gradients: Ensures alpha and beta parameters receive gradients during backpropagation

Key Technical Challenges and Solutions

Challenge 1: Gradient Flow for Alpha and Beta

Problem: Initial implementation used .item() to convert Parameters to scalars throughout the computation, breaking gradient flow.

Solution:

  • Extract scalar values ONLY for weight norm calculation (which is detached anyway)
  • Use the Parameter tensors directly in the forward computation
  • This ensures gradients flow back to alpha and beta

Challenge 2: Embedding Layer Matrix Dimensions

Problem: Embedding layers store lora_embedding_A and lora_embedding_B with shapes that need transposition before use.

Solution:

  • Transpose matrices in the variant's forward method: lora_embedding_A.T and lora_embedding_B.T
  • Pass the base_result to the layer for proper weighted combination
  • Handle special embed_scale for certain embedding types (e.g., Gemma3)

Challenge 3: Parameter Initialization in Override Methods

Problem: Embedding and _ConvNd classes override update_layer() without calling super(), so they missed WoRA parameter initialization.

Solution:

  • Duplicated WoRA parameter initialization in both override methods
  • Ensured identical initialization logic across all layer types
  • Added explicit requires_grad_(True) to ensure trainability

Challenge 4: Conv Layer Forward Pass

Problem: Convolutional layers have more complex forward logic with bias handling and reshaping requirements.

Solution:

  • Adapted the weight norm calculation for higher-dimensional tensors
  • Properly handled base_result computation with stride, padding, dilation
  • Ensured mag_norm_scale broadcasting works correctly with conv outputs

Verification and Testing

Test Coverage

The implementation includes two parametrized tests that cover:

  1. Variant Application Test: Verifies that:

    • WoRA variants are correctly instantiated for all layer types
    • The variant types match the expected classes (WoraLinearVariant, WoraEmbeddingVariant, etc.)
    • The variant system properly handles unsupported layer types
  2. Gradient Flow Test: Verifies that:

    • All WoRA parameters (alpha, beta, magnitude vector) receive gradients during backpropagation
    • Gradients are non-None for all layer types (Linear, Embedding, Conv1d, Conv2d)
    • The parameters actively participate in the forward computation

Test Results

All tests pass successfully:
Screenshot 2025-10-27 at 2 10 53 AM

Files Modified

  • src/peft/tuners/lora/config.py: Added use_wora configuration parameter
  • src/peft/tuners/lora/layer.py: Added WoRA parameter initialization in update_layer methods
  • src/peft/tuners/lora/wora.py: Implemented WoRA layer classes
  • src/peft/tuners/lora/variants.py: Implemented WoRA variant classes
  • tests/test_lora_variants.py: Added comprehensive WoRA tests

Backward Compatibility

This implementation maintains full backward compatibility:

  • Default behavior unchanged (use_wora defaults to False)
  • No modifications to existing LoRA/DoRA code paths
  • All existing tests continue to pass
  • New parameters only created when use_wora=True

cc: @BenjaminBossan


- Add WoRA to test variant map in test_lora_variants.py
- Add test case for WoRA variant application to all layer types
- Add test for WoRA alpha/beta parameter gradients
- Fix WoRA parameter initialization in Embedding.update_layer
- Fix WoRA parameter initialization in _ConvNd.update_layer
- Fix WoraEmbeddingLayer to include alpha in computation
- Fix WoraConvNdLayer gradient flow for alpha/beta parameters
- Transpose embedding matrices in WoraEmbeddingVariant.forward
- Add embed_scale support in WoraEmbeddingVariant

All WoRA tests now pass successfully.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT] Integrate WoRA (Filtering-WoRA) into PEFT

1 participant