Feat: Add neuron backend to TEI by JingyaHuang · Pull Request #742 · huggingface/text-embeddings-inference

JingyaHuang · 2025-10-22T16:32:33Z

What does this PR do?

This PR adds Neuron as a device option for the TEI Python backend.

Both pre-compiled ckpt and compilation on the fly are supported

HuggingFaceDocBuilderDev · 2025-10-22T16:34:13Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

alvarobartt

Thanks a lot @JingyaHuang, great work! 🤗

alvarobartt · 2026-02-20T13:35:23Z

backends/src/lib.rs

+fn is_neuron() -> bool {
+    match Command::new("neuron-ls").output() {
+        Ok(output) => output.status.success(),
+        Err(_) => false,
+    }
+}
+


The following is subject to leaving the python feature only for python-habana and creating a python-neuron feature instead; so that python feature is deprecated in favor of python-habana in the next releases, cc @kaixuanliu

I don't dislike this, but I'd prefer to remove it and leave it to the backend to fail i.e., if user compiles with --features python-neuron and either the model.neuron file is not there, or the device is not Inferentia 2 (or any other AWS Neuron supported device), the initialization of the backend will fail. But I'll leave this snippet here to only perform the download of the files required for the backend, rather than on backend validation, thoughts? @JingyaHuang

it's nice if we can create a feature neuron-only, will add it.

alvarobartt · 2026-02-20T13:35:55Z

backends/src/lib.rs

+    let mut model_files: Vec<PathBuf> = Vec::new();
+
+    tracing::info!("Downloading `model.neuron`");
+    match api.get("model.neuron").await {


Not sure how common this is, but I'd extend support for sharded AWS Neuron model files.

alvarobartt · 2026-02-20T13:41:38Z

backends/src/lib.rs

@@ -418,16 +425,48 @@ async fn init_backend(
    if let Some(api_repo) = api_repo.as_ref() {
        if cfg!(feature = "python") || cfg!(feature = "candle") {


TL;DR Add an if-statement either above or below this one to capture if cfg!(feature = "python-neuron"), and inside attempt to download the model.neuron file first, if not there then warn about runtime compilation and download the model.safetensors.

backends/src/lib.rs

alvarobartt · 2026-02-20T13:43:34Z

backends/python/server/text_embeddings_server/models/neuron/__init__.py

Maybe there's a "better" way of splitting both Intel Habana and AWS Neuron on e.g. different subdirectories under text_embeddings_server?

Yeah, I would put neuron related modeling code under a folder named backends/python/server/text_embeddings_server/models/neuron, but for habana, I'm not very confident which ones are habana-only.

Dockerfile-neuron

alvarobartt · 2026-02-20T13:46:52Z

Dockerfile-neuron

+RUN pip install --no-cache-dir -U \
+    networkx==2.8.8 \
+    transformers[sentencepiece,audio,vision]==${TRANSFORMERS_VERSION} \
+    diffusers==${DIFFUSERS_VERSION} \
+    compel \
+    controlnet-aux \
+    huggingface_hub==${HUGGINGFACE_HUB_VERSION} \
+    hf_transfer \
+    datasets==${DATASETS_VERSION} \
+    optimum-neuron==${OPTIMUM_NEURON_VERSION} \
+    sentence_transformers==${SENTENCE_TRANSFORMERS} \
+    peft==${PEFT_VERSION} \
+ && rm -rf ~/.cache/pip/*


Not something to tackle in this PR maybe, but I'd rather rely on a lock file here instead of those, so it might be worth consider re-opening #587?

cc @regisss and @kaixuanliu as this was something mentioned in the past, but apparently it was failing on Intel HPUs (?)

I think it should work on HPU, not sure why it failed at that time. so don't hesitate to go that way, and if you have a lock file you would like me to test on HPU, happy to do it :)

Thanks @regisss, I'll restart Nico's PR to add uv support instead, and ping you when done for testing 🤗

Dockerfile-neuron

Co-authored-by: Alvaro Bartolome <36760800+alvarobartt@users.noreply.github.com>

…ddings-inference into add-neuron-backend

JingyaHuang added 3 commits August 22, 2025 11:23

1st draft

710b8c1

Merge branch 'main' into add-neuron-backend

cc84f29

feat: sentence transformer for neuron

139b179

JingyaHuang added 8 commits October 27, 2025 17:10

fix: neuron dockerfile

dd0c08d

remove useless

1e4f3c9

Merge branch 'main' into add-neuron-backend

adfa2e9

fix dockerfile

a25cf98

neuron path

56c15d8

fix container env + Neuron related changes

142520a

fix for neuron backend + tests

7ada877

add to CI & add pre-compiled test

976b71c

JingyaHuang marked this pull request as ready for review February 4, 2026 15:24

JingyaHuang and others added 9 commits February 4, 2026 22:32

fix tests

dc3edc2

Merge branch 'main' into add-neuron-backend

3676b94

snol fix

b803566

fix doc index

81c57d3

fix style

7f517b9

build and push neuron docker images in CI

9752998

smol changes

c517aa2

Merge branch 'huggingface:main' into add-neuron-backend

d1708a3

Merge branch 'main' into add-neuron-backend

08301f0

alvarobartt added this to the v1.10.0 milestone Feb 20, 2026

Merge branch 'main' into add-neuron-backend

37519d9

alvarobartt reviewed Feb 20, 2026

View reviewed changes

JingyaHuang and others added 5 commits February 20, 2026 16:02

Update Dockerfile-neuron

533d853

Co-authored-by: Alvaro Bartolome <36760800+alvarobartt@users.noreply.github.com>

Apply suggestions from code review

0829b6f

Co-authored-by: Alvaro Bartolome <36760800+alvarobartt@users.noreply.github.com>

Merge branch 'main' into add-neuron-backend

aa47549

review:suggestions

1464cc3

Merge branch 'add-neuron-backend' of github.com:JingyaHuang/text-embe…

9961846

…ddings-inference into add-neuron-backend

		@@ -418,16 +425,48 @@ async fn init_backend(
		if let Some(api_repo) = api_repo.as_ref() {
		if cfg!(feature = "python") \|\| cfg!(feature = "candle") {

Conversation

JingyaHuang commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Oct 22, 2025

Uh oh!

alvarobartt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JingyaHuang commented Oct 22, 2025 •

edited

Loading