Add bidirectional attention and projection layer support for Qwen3-based models #808

williambarberjr · 2026-01-30T21:43:17Z

What does this PR do?

This PR adds support for voyageai/voyage-4-nano, a Qwen3-based embedding model that uses bidirectional attention and a projection layer.

Changes

1. Bidirectional Attention Support

Added use_bidirectional_attention config field (default: false)
When true, disables causal masking in the attention mechanism
voyage-4-nano and similar embedding models use bidirectional attention to see the full context

2. Projection Layer Support

Added num_labels config field for output projection dimension
When set, loads linear.weight from safetensors root level and applies projection after final normalization
voyage-4-nano projects from hidden_size=1024 to output_dim=2048

Model Configuration

Models using these features should have in their config.json:

{
  "use_bidirectional_attention": true,
  "num_labels": 2048
}

Testing

Tested with voyageai/voyage-4-nano:

✅ Output dimension: 2048 (correct)
✅ Cosine similarity vs HuggingFace transformers: 0.999965
✅ Inference time: ~9ms on L4 GPU (vs 35ms with transformers)

Files Changed

backends/candle/src/models/flash_qwen3.rs - CUDA/flash attention implementation
backends/candle/src/models/qwen3.rs - CPU/Metal implementation + config struct
backends/candle/Cargo.toml - Added cudarc dev-dependency for CUDA tests
backends/candle/tests/test_voyage_nano.rs - CPU test with snapshots
backends/candle/tests/test_flash_voyage_nano.rs - CUDA test with snapshots
README.md - Added voyage-4-nano to supported models table
docs/source/en/supported_models.md - Added voyage-4-nano to docs

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines.
Did you write any new necessary tests? If applicable, did you include or update the insta snapshots?

Who can review?

@Narsil @alvarobartt - This adds two new config fields to support voyage-4-nano embedding model. The changes are backwards compatible (both fields default to disabled behavior).

Add two new config fields to Qwen3 to support voyage-4-nano and similar models: - `use_bidirectional_attention`: When true, disables causal masking for embedding models that use full bidirectional attention - `num_labels`: When set, loads projection layer from linear.weight at safetensors root level (e.g., 1024 -> 2048 for voyage-4-nano) Both fields are backwards compatible, defaulting to disabled behavior. Changes: - backends/candle/src/models/qwen3.rs: Add config fields and CPU impl - backends/candle/src/models/flash_qwen3.rs: Add CUDA/flash-attn impl - backends/candle/tests/test_voyage_nano.rs: CPU tests with snapshots - backends/candle/tests/test_flash_voyage_nano.rs: CUDA tests - README.md, docs/source/en/supported_models.md: Add voyage-4-nano Tested with voyageai/voyage-4-nano: - Output dimension: 2048 (correct) - Cosine similarity vs transformers: 0.999965 - Inference time: ~9ms on L4 GPU (vs 35ms with transformers)

williambarberjr force-pushed the voyage-4-nano-support branch 6 times, most recently from bd2bc16 to 539f322 Compare January 30, 2026 22:59

williambarberjr force-pushed the voyage-4-nano-support branch from 539f322 to 7e08d97 Compare January 31, 2026 02:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bidirectional attention and projection layer support for Qwen3-based models #808

Add bidirectional attention and projection layer support for Qwen3-based models #808

williambarberjr commented Jan 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add bidirectional attention and projection layer support for Qwen3-based models #808

Are you sure you want to change the base?

Add bidirectional attention and projection layer support for Qwen3-based models #808

Conversation

williambarberjr commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changes

1. Bidirectional Attention Support

2. Projection Layer Support

Model Configuration

Testing

Files Changed

Before submitting

Who can review?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

williambarberjr commented Jan 30, 2026 •

edited

Loading