Skip to content

Feat: Add neuron backend to TEI#742

Open
JingyaHuang wants to merge 26 commits intohuggingface:mainfrom
JingyaHuang:add-neuron-backend
Open

Feat: Add neuron backend to TEI#742
JingyaHuang wants to merge 26 commits intohuggingface:mainfrom
JingyaHuang:add-neuron-backend

Conversation

@JingyaHuang
Copy link
Collaborator

@JingyaHuang JingyaHuang commented Oct 22, 2025

What does this PR do?

This PR adds Neuron as a device option for the TEI Python backend.

  • Both pre-compiled ckpt and compilation on the fly are supported

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@JingyaHuang JingyaHuang marked this pull request as ready for review February 4, 2026 15:24
@alvarobartt alvarobartt added this to the v1.10.0 milestone Feb 20, 2026
Copy link
Member

@alvarobartt alvarobartt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @JingyaHuang, great work! 🤗

Comment on lines +71 to +77
fn is_neuron() -> bool {
match Command::new("neuron-ls").output() {
Ok(output) => output.status.success(),
Err(_) => false,
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following is subject to leaving the python feature only for python-habana and creating a python-neuron feature instead; so that python feature is deprecated in favor of python-habana in the next releases, cc @kaixuanliu

I don't dislike this, but I'd prefer to remove it and leave it to the backend to fail i.e., if user compiles with --features python-neuron and either the model.neuron file is not there, or the device is not Inferentia 2 (or any other AWS Neuron supported device), the initialization of the backend will fail. But I'll leave this snippet here to only perform the download of the files required for the backend, rather than on backend validation, thoughts? @JingyaHuang

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's nice if we can create a feature neuron-only, will add it.

let mut model_files: Vec<PathBuf> = Vec::new();

tracing::info!("Downloading `model.neuron`");
match api.get("model.neuron").await {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how common this is, but I'd extend support for sharded AWS Neuron model files.

@@ -418,16 +425,48 @@ async fn init_backend(
if let Some(api_repo) = api_repo.as_ref() {
if cfg!(feature = "python") || cfg!(feature = "candle") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR Add an if-statement either above or below this one to capture if cfg!(feature = "python-neuron"), and inside attempt to download the model.neuron file first, if not there then warn about runtime compilation and download the model.safetensors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there's a "better" way of splitting both Intel Habana and AWS Neuron on e.g. different subdirectories under text_embeddings_server?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would put neuron related modeling code under a folder named backends/python/server/text_embeddings_server/models/neuron, but for habana, I'm not very confident which ones are habana-only.

Comment on lines +163 to +175
RUN pip install --no-cache-dir -U \
networkx==2.8.8 \
transformers[sentencepiece,audio,vision]==${TRANSFORMERS_VERSION} \
diffusers==${DIFFUSERS_VERSION} \
compel \
controlnet-aux \
huggingface_hub==${HUGGINGFACE_HUB_VERSION} \
hf_transfer \
datasets==${DATASETS_VERSION} \
optimum-neuron==${OPTIMUM_NEURON_VERSION} \
sentence_transformers==${SENTENCE_TRANSFORMERS} \
peft==${PEFT_VERSION} \
&& rm -rf ~/.cache/pip/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not something to tackle in this PR maybe, but I'd rather rely on a lock file here instead of those, so it might be worth consider re-opening #587?

cc @regisss and @kaixuanliu as this was something mentioned in the past, but apparently it was failing on Intel HPUs (?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should work on HPU, not sure why it failed at that time. so don't hesitate to go that way, and if you have a lock file you would like me to test on HPU, happy to do it :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @regisss, I'll restart Nico's PR to add uv support instead, and ping you when done for testing 🤗

JingyaHuang and others added 5 commits February 20, 2026 16:02
Co-authored-by: Alvaro Bartolome <36760800+alvarobartt@users.noreply.github.com>
Co-authored-by: Alvaro Bartolome <36760800+alvarobartt@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants