Feat: Add neuron backend to TEI#742
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
alvarobartt
left a comment
There was a problem hiding this comment.
Thanks a lot @JingyaHuang, great work! 🤗
backends/src/lib.rs
Outdated
| fn is_neuron() -> bool { | ||
| match Command::new("neuron-ls").output() { | ||
| Ok(output) => output.status.success(), | ||
| Err(_) => false, | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
The following is subject to leaving the
pythonfeature only forpython-habanaand creating apython-neuronfeature instead; so thatpythonfeature is deprecated in favor ofpython-habanain the next releases, cc @kaixuanliu
I don't dislike this, but I'd prefer to remove it and leave it to the backend to fail i.e., if user compiles with --features python-neuron and either the model.neuron file is not there, or the device is not Inferentia 2 (or any other AWS Neuron supported device), the initialization of the backend will fail. But I'll leave this snippet here to only perform the download of the files required for the backend, rather than on backend validation, thoughts? @JingyaHuang
There was a problem hiding this comment.
it's nice if we can create a feature neuron-only, will add it.
| let mut model_files: Vec<PathBuf> = Vec::new(); | ||
|
|
||
| tracing::info!("Downloading `model.neuron`"); | ||
| match api.get("model.neuron").await { |
There was a problem hiding this comment.
Not sure how common this is, but I'd extend support for sharded AWS Neuron model files.
backends/src/lib.rs
Outdated
| @@ -418,16 +425,48 @@ async fn init_backend( | |||
| if let Some(api_repo) = api_repo.as_ref() { | |||
| if cfg!(feature = "python") || cfg!(feature = "candle") { | |||
There was a problem hiding this comment.
TL;DR Add an if-statement either above or below this one to capture if cfg!(feature = "python-neuron"), and inside attempt to download the model.neuron file first, if not there then warn about runtime compilation and download the model.safetensors.
There was a problem hiding this comment.
Maybe there's a "better" way of splitting both Intel Habana and AWS Neuron on e.g. different subdirectories under text_embeddings_server?
There was a problem hiding this comment.
Yeah, I would put neuron related modeling code under a folder named backends/python/server/text_embeddings_server/models/neuron, but for habana, I'm not very confident which ones are habana-only.
| RUN pip install --no-cache-dir -U \ | ||
| networkx==2.8.8 \ | ||
| transformers[sentencepiece,audio,vision]==${TRANSFORMERS_VERSION} \ | ||
| diffusers==${DIFFUSERS_VERSION} \ | ||
| compel \ | ||
| controlnet-aux \ | ||
| huggingface_hub==${HUGGINGFACE_HUB_VERSION} \ | ||
| hf_transfer \ | ||
| datasets==${DATASETS_VERSION} \ | ||
| optimum-neuron==${OPTIMUM_NEURON_VERSION} \ | ||
| sentence_transformers==${SENTENCE_TRANSFORMERS} \ | ||
| peft==${PEFT_VERSION} \ | ||
| && rm -rf ~/.cache/pip/* |
There was a problem hiding this comment.
Not something to tackle in this PR maybe, but I'd rather rely on a lock file here instead of those, so it might be worth consider re-opening #587?
cc @regisss and @kaixuanliu as this was something mentioned in the past, but apparently it was failing on Intel HPUs (?)
There was a problem hiding this comment.
I think it should work on HPU, not sure why it failed at that time. so don't hesitate to go that way, and if you have a lock file you would like me to test on HPU, happy to do it :)
There was a problem hiding this comment.
Thanks @regisss, I'll restart Nico's PR to add uv support instead, and ping you when done for testing 🤗
Co-authored-by: Alvaro Bartolome <36760800+alvarobartt@users.noreply.github.com>
Co-authored-by: Alvaro Bartolome <36760800+alvarobartt@users.noreply.github.com>
…ddings-inference into add-neuron-backend
What does this PR do?
This PR adds Neuron as a device option for the TEI Python backend.