Add `Dense` layer for `2_Dense/` modules #660

alvarobartt · 2025-06-26T16:26:36Z

What does this PR do?

This PR adds support for 2_Dense/ modules, since some models as e.g. https://huggingface.co/sentence-transformers/LaBSE require the extra Dense module i.e., an extra Linear layer on top of the pooled embeddings, when generating the embeddings.

So on, this PR introduces the DenseLayer trait, impl for Dense and adds DenseConfig, which are models with basically a single Linear layer, pulling the configuration from 2_Dense/config.json and the model weights from 2_Dense/model.safetensors.

Note

The 2_Dense/ is only required when generating embeddings, meaning that it will only apply to the Embedding model type, whereas the Reranker and Classifier are not affected by this addition, so on, neither the rank or predict methods for the given backend.

This PR solves the issue recently reported at https://discuss.huggingface.co/t/inference-result-not-aligned-with-local-version-of-same-model-and-revision/160514.

Additionally, this PR also fixes a shape mismatch issue produced when performing matrix multiplication of 2D tensors on Metal devices due to the candle Metal kernels expecting the tensors to be contiguous. It seems that the error only arises on Metal for 2D tensors, where as for e.g. 3D tensors it seems to be working just fine without having to use .contiguous() (which is expensive as it needs to clone the tensor).

Reproduce

To ensure that the implementation was working fine and producing successful results i.e., allclose like checks are true, and the cosine similarity is 1.0 (or as close as possible), the following test has been run:

Deploy Text Embeddings Inference (TEI) as e.g.:

cargo run --release --features candle,http --no-default-features -- --model-id sentence-transformers/LaBSE --dtype float16

Then, once it's running run the following Python script (requires torch, transformers, sentence-transformers, accelerate and numpy):

import numpy as np
import requests
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "sentence-transformers/LaBSE",
    model_kwargs={
        "torch_dtype": "float16",
        "device_map": "mps",
    },
)

out_py = model.encode(
    "What is Deep Learning?",
    normalize_embeddings=True,
    convert_to_numpy=True,
)

response = requests.post(
    "http://localhost:3000/embed",
    json={
        "inputs": "What is Deep Learning?",
        "normalize": True,
    },
)

response.raise_for_status()
out = response.json()[0]
out_http = np.array(out, dtype=np.float16)

print(f"Embeddings are close: {np.allclose(out_py, out_http, atol=1e-3, rtol=1e-4)=}")


def cosine_similarity(x: np.ndarray, y: np.ndarray) -> float:
    dot_product = np.dot(x, y)
    norm_x = np.linalg.norm(x)
    norm_y = np.linalg.norm(y)
    return dot_product / (norm_x * norm_y)


print(f"The similarity score is: {cosine_similarity(out_py, out_http)=}")

It should produce the following on any combination of device (CPU, MPS, CUDA) and dtype (float32, float16):

Embeddings are close: np.allclose(out_py, out_http, atol=1e-3, rtol=1e-4)=True
The similarity score is: cosine_similarity(out_py, out_http)=np.float16(1.0)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@Narsil

Apparently, `candle` expects the tensors to be contiguous on Metal when performing 2D matrix multiplication

Required for some models as e.g. https://huggingface.co/sentence-transformers/LaBSE

kozistr · 2025-06-27T02:37:21Z

@alvarobartt Hi! Just for your reference, you may already be aware, Stella v5 model uses Identity layer as its activation function for 2_Dense! 2_Dense/config.json

Narsil

Looks good, I think we can simplify a bit the parsing part.

backends/candle/src/models/dense.rs

If `--dense-path` was not allowed, that would prevent users from using other `Dense` layers when available as per e.g. https://huggingface.co/NovaSearch/stella_en_400M_v5, that contains different directories for different `Dense` layers with different output vector dimensionality as `2_Dense_<dims>/`.

HuggingFaceDocBuilderDev · 2025-07-23T08:39:13Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

alvarobartt added 2 commits June 26, 2025 18:09

Fix forward pass on Linear for Metal devices

73935bc

Apparently, `candle` expects the tensors to be contiguous on Metal when performing 2D matrix multiplication

Add Dense, DenseLayer and DenseConfig to handle 2_Dense/

1a59eaf

Required for some models as e.g. https://huggingface.co/sentence-transformers/LaBSE

alvarobartt requested a review from Narsil June 26, 2025 16:26

alvarobartt added 2 commits June 26, 2025 18:38

Fix linting and update code-comment

b666d0e

Run pre-commit run --all-files

27adbb6

alvarobartt added 3 commits July 2, 2025 12:01

Merge branch 'main' into add-dense

578ea3a

Add DenseActivation and handle tanh and identity

1f38d86

Run pre-commit run --all-files

dcb3ee8

Narsil reviewed Jul 2, 2025

View reviewed changes

backends/candle/src/models/dense.rs Outdated Show resolved Hide resolved

backends/candle/src/models/dense.rs Outdated Show resolved Hide resolved

alvarobartt added 2 commits July 2, 2025 13:34

Fix warn/error messages on 2_Dense downloads

070ef02

alvarobartt changed the title ~~Add Dense, DenseLayer and DenseConfig to handle 2_Dense/~~ Add Dense layer in 2_Dense/ modules Jul 2, 2025

alvarobartt added 5 commits July 2, 2025 15:06

Update download_artifacts in candle/tests to include dense_path

5ff72a6

Add backends/candle/tests/test_dense.rs

1618559

Rename with serde in DenseActivation

144aec1

Add comments in DenseActivation

0d1b698

Add missing None in run for dense_path

a368460

alvarobartt marked this pull request as ready for review July 3, 2025 08:57

alvarobartt added 2 commits July 3, 2025 10:58

Merge branch 'main' into add-dense

4c855bc

Update text-embeddings-router --help output

427ff22

alvarobartt requested a review from Narsil July 23, 2025 10:13

Narsil approved these changes Jul 23, 2025

View reviewed changes

alvarobartt changed the title ~~Add Dense layer in 2_Dense/ modules~~ Add Dense layer for 2_Dense/ modules Jul 25, 2025

alvarobartt merged commit 519ecac into main Jul 25, 2025
15 checks passed

alvarobartt deleted the add-dense branch July 25, 2025 06:44

BrewTestBot mentioned this pull request Aug 5, 2025

text-embeddings-inference 1.8.0 Homebrew/homebrew-core#232408

Closed

alvarobartt mentioned this pull request Aug 18, 2025

Parse modules.json to identify default Dense modules #701

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `Dense` layer for `2_Dense/` modules #660

Add `Dense` layer for `2_Dense/` modules #660

Uh oh!

alvarobartt commented Jun 26, 2025 •

edited

Loading

Uh oh!

kozistr commented Jun 27, 2025

Uh oh!

Narsil left a comment

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jul 23, 2025

Uh oh!

Uh oh!

Uh oh!

Add Dense layer for 2_Dense/ modules #660

Add Dense layer for 2_Dense/ modules #660

Uh oh!

Conversation

alvarobartt commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Reproduce

Before submitting

Who can review?

Uh oh!

kozistr commented Jun 27, 2025

Uh oh!

Narsil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jul 23, 2025

Uh oh!

Uh oh!

Uh oh!

Add `Dense` layer for `2_Dense/` modules #660

Add `Dense` layer for `2_Dense/` modules #660

alvarobartt commented Jun 26, 2025 •

edited

Loading