-
Notifications
You must be signed in to change notification settings - Fork 370
Feat: Add neuron backend to TEI #742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
JingyaHuang
wants to merge
26
commits into
huggingface:main
Choose a base branch
from
JingyaHuang:add-neuron-backend
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
26 commits
Select commit
Hold shift + click to select a range
710b8c1
1st draft
JingyaHuang cc84f29
Merge branch 'main' into add-neuron-backend
JingyaHuang 139b179
feat: sentence transformer for neuron
JingyaHuang dd0c08d
fix: neuron dockerfile
JingyaHuang 1e4f3c9
remove useless
JingyaHuang adfa2e9
Merge branch 'main' into add-neuron-backend
JingyaHuang a25cf98
fix dockerfile
JingyaHuang 56c15d8
neuron path
JingyaHuang 142520a
fix container env + Neuron related changes
JingyaHuang 7ada877
fix for neuron backend + tests
JingyaHuang 976b71c
add to CI & add pre-compiled test
JingyaHuang dc3edc2
fix tests
JingyaHuang 3676b94
Merge branch 'main' into add-neuron-backend
JingyaHuang b803566
snol fix
JingyaHuang 81c57d3
fix doc index
JingyaHuang 7f517b9
fix style
JingyaHuang 9752998
build and push neuron docker images in CI
JingyaHuang c517aa2
smol changes
JingyaHuang d1708a3
Merge branch 'huggingface:main' into add-neuron-backend
JingyaHuang 08301f0
Merge branch 'main' into add-neuron-backend
37519d9
Merge branch 'main' into add-neuron-backend
JingyaHuang 533d853
Update Dockerfile-neuron
JingyaHuang 0829b6f
Apply suggestions from code review
JingyaHuang aa47549
Merge branch 'main' into add-neuron-backend
JingyaHuang 1464cc3
review:suggestions
JingyaHuang 9961846
Merge branch 'add-neuron-backend' of github.com:JingyaHuang/text-embe…
JingyaHuang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,6 +18,7 @@ on: | |
| - "Cargo.lock" | ||
| - "rust-toolchain.toml" | ||
| - "Dockerfile" | ||
| - "Dockerfile-neuron" | ||
| branches: | ||
| - "main" | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
| name: Run Neuron integration tests | ||
|
|
||
| on: | ||
| workflow_dispatch: | ||
| schedule: | ||
| - cron: '0 0 * * *' # Run the workflow nightly to check Neuron integration is working | ||
|
|
||
| jobs: | ||
| tests: | ||
| concurrency: | ||
| group: ${{ github.workflow }}-${{ github.job }}-${{ github.head_ref || github.run_id }} | ||
| cancel-in-progress: true | ||
| runs-on: | ||
| group: aws-inf2-8xlarge | ||
| steps: | ||
| - name: Checkout repository | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Install uv | ||
| uses: astral-sh/setup-uv@v5 | ||
|
|
||
| - name: Build Docker image for Neuron | ||
| run: | | ||
| docker build . -f Dockerfile-neuron -t tei-neuron | ||
|
|
||
| - name: Run integration tests | ||
| working-directory: integration_tests | ||
| env: | ||
| HF_TOKEN: ${{ secrets.HF_TOKEN }} | ||
| DOCKER_IMAGE: tei-neuron | ||
| run: | | ||
| uv sync --locked --all-extras --dev | ||
| uv run pytest --durations=0 -sv neuron/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,189 @@ | ||
| FROM lukemathwalker/cargo-chef:latest-rust-1.92-bookworm AS chef | ||
| WORKDIR /usr/src | ||
|
|
||
| ENV SCCACHE=0.10.0 | ||
| ENV RUSTC_WRAPPER=/usr/local/bin/sccache | ||
|
|
||
| # Download, configure sccache | ||
| RUN curl -fsSL https://github.com/mozilla/sccache/releases/download/v$SCCACHE/sccache-v$SCCACHE-x86_64-unknown-linux-musl.tar.gz | tar -xzv --strip-components=1 -C /usr/local/bin sccache-v$SCCACHE-x86_64-unknown-linux-musl/sccache && \ | ||
| chmod +x /usr/local/bin/sccache | ||
|
|
||
| FROM chef AS planner | ||
|
|
||
| COPY backends backends | ||
| COPY core core | ||
| COPY router router | ||
| COPY Cargo.toml ./ | ||
| COPY Cargo.lock ./ | ||
|
|
||
| RUN cargo chef prepare --recipe-path recipe.json | ||
|
|
||
| FROM chef AS builder | ||
|
|
||
| ARG GIT_SHA | ||
| ARG DOCKER_LABEL | ||
|
|
||
| # sccache specific variables | ||
| ARG SCCACHE_GHA_ENABLED | ||
|
|
||
| COPY --from=planner /usr/src/recipe.json recipe.json | ||
|
|
||
| RUN --mount=type=secret,id=actions_results_url,env=ACTIONS_RESULTS_URL \ | ||
| --mount=type=secret,id=actions_runtime_token,env=ACTIONS_RUNTIME_TOKEN \ | ||
| cargo chef cook --release --features python-neuron --no-default-features --recipe-path recipe.json && sccache -s | ||
|
|
||
| COPY backends backends | ||
| COPY core core | ||
| COPY router router | ||
| COPY Cargo.toml ./ | ||
| COPY Cargo.lock ./ | ||
|
|
||
| RUN PROTOC_ZIP=protoc-21.12-linux-x86_64.zip && \ | ||
| curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v21.12/$PROTOC_ZIP && \ | ||
| unzip -o $PROTOC_ZIP -d /usr/local bin/protoc && \ | ||
| unzip -o $PROTOC_ZIP -d /usr/local 'include/*' && \ | ||
| rm -f $PROTOC_ZIP | ||
|
|
||
| FROM builder AS http-builder | ||
|
|
||
| RUN --mount=type=secret,id=actions_results_url,env=ACTIONS_RESULTS_URL \ | ||
| --mount=type=secret,id=actions_runtime_token,env=ACTIONS_RUNTIME_TOKEN \ | ||
| cargo build --release --bin text-embeddings-router -F python-neuron -F http --no-default-features && sccache -s | ||
|
|
||
| FROM builder AS grpc-builder | ||
|
|
||
| COPY proto proto | ||
|
|
||
| RUN --mount=type=secret,id=actions_results_url,env=ACTIONS_RESULTS_URL \ | ||
| --mount=type=secret,id=actions_runtime_token,env=ACTIONS_RUNTIME_TOKEN \ | ||
| cargo build --release --bin text-embeddings-router -F grpc -F python-neuron --no-default-features && sccache -s | ||
|
|
||
| FROM public.ecr.aws/docker/library/ubuntu:22.04 AS neuron | ||
|
|
||
| ENV HUGGINGFACE_HUB_CACHE=/data \ | ||
| PORT=80 | ||
|
|
||
| ENV PATH="/usr/local/bin:/root/.local/bin:${PATH}" | ||
|
|
||
| RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \ | ||
| python3 \ | ||
| python3-pip \ | ||
| python3-dev \ | ||
| build-essential \ | ||
| git \ | ||
| curl \ | ||
| cmake \ | ||
| pkg-config \ | ||
| protobuf-compiler \ | ||
| ninja-build \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| RUN ln -s /usr/bin/python3 /usr/local/bin/python || true | ||
| RUN ln -s /usr/bin/pip3 /usr/local/bin/pip || true | ||
|
|
||
| WORKDIR /usr/src | ||
| COPY backends backends | ||
| COPY backends/python/server/text_embeddings_server/models/__init__.py backends/python/server/text_embeddings_server/models/__init__.py | ||
| COPY backends/python/server/pyproject.toml backends/python/server/pyproject.toml | ||
| RUN cd backends/python/server && \ | ||
| make install | ||
|
|
||
| ARG NEURONX_COLLECTIVES_LIB_VERSION=2.28.27.0-bc30ece58 | ||
| ARG NEURONX_RUNTIME_LIB_VERSION=2.28.23.0-dd5879008 | ||
| ARG NEURONX_TOOLS_VERSION=2.26.14.0 | ||
|
|
||
| ARG NEURONX_CC_VERSION=2.21.33363.0+82129205 | ||
| ARG NEURONX_FRAMEWORK_VERSION=2.8.0.2.10.16998+e9bf8a50 | ||
| ARG NEURONX_DISTRIBUTED_VERSION=0.15.22404+1f27bddf | ||
|
|
||
| RUN apt-get update \ | ||
| && apt-get upgrade -y \ | ||
| && apt-get install -y --no-install-recommends \ | ||
| apt-transport-https \ | ||
| build-essential \ | ||
| ca-certificates \ | ||
| cmake \ | ||
| curl \ | ||
| emacs \ | ||
| git \ | ||
| gnupg2 \ | ||
| gpg-agent \ | ||
| jq \ | ||
| libgl1-mesa-glx \ | ||
| libglib2.0-0 \ | ||
| libsm6 \ | ||
| libxext6 \ | ||
| libxrender-dev \ | ||
| libcap-dev \ | ||
| libhwloc-dev \ | ||
| openjdk-11-jdk \ | ||
| unzip \ | ||
| vim \ | ||
| wget \ | ||
| zlib1g-dev \ | ||
| && rm -rf /var/lib/apt/lists/* \ | ||
| && rm -rf /tmp/tmp* \ | ||
| && apt-get clean | ||
|
|
||
| # Ubuntu 22.04 = jammy; use signed-by (apt-key is deprecated) | ||
| RUN wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | gpg --dearmor -o /usr/share/keyrings/neuron-archive-keyring.gpg && \ | ||
| echo "deb [signed-by=/usr/share/keyrings/neuron-archive-keyring.gpg] https://apt.repos.neuron.amazonaws.com jammy main" > /etc/apt/sources.list.d/neuron.list | ||
|
|
||
| RUN apt-get update \ | ||
| && apt-get install -y \ | ||
| aws-neuronx-tools=$NEURONX_TOOLS_VERSION \ | ||
| aws-neuronx-collectives=$NEURONX_COLLECTIVES_LIB_VERSION \ | ||
| aws-neuronx-runtime-lib=$NEURONX_RUNTIME_LIB_VERSION \ | ||
| && rm -rf /var/lib/apt/lists/* \ | ||
| && rm -rf /tmp/tmp* \ | ||
| && apt-get clean | ||
|
|
||
| ENV PATH="/opt/aws/neuron/bin:${PATH}" | ||
|
|
||
| RUN pip install --index-url https://pip.repos.neuron.amazonaws.com \ | ||
| --extra-index-url https://pypi.org/simple \ | ||
| --trusted-host pip.repos.neuron.amazonaws.com \ | ||
| neuronx-cc==$NEURONX_CC_VERSION \ | ||
| torch-neuronx==$NEURONX_FRAMEWORK_VERSION \ | ||
| torchvision \ | ||
| neuronx_distributed==$NEURONX_DISTRIBUTED_VERSION \ | ||
| && rm -rf ~/.cache/pip/* | ||
|
|
||
| # HF ARGS | ||
| # Note: optimum-neuron 0.4.4 requires transformers~=4.57.1 | ||
| ARG TRANSFORMERS_VERSION=4.57.1 | ||
| ARG DIFFUSERS_VERSION=0.35.2 | ||
| ARG HUGGINGFACE_HUB_VERSION=0.36.0 | ||
| ARG OPTIMUM_NEURON_VERSION=0.4.4 | ||
| ARG SENTENCE_TRANSFORMERS=5.1.2 | ||
| ARG PEFT_VERSION=0.17.0 | ||
| ARG DATASETS_VERSION=4.1.1 | ||
|
|
||
| # Install Hugging Face libraries and dependencies for TEI on Neuron | ||
| RUN pip install --no-cache-dir -U \ | ||
| networkx==2.8.8 \ | ||
| transformers[sentencepiece,audio,vision]==${TRANSFORMERS_VERSION} \ | ||
| diffusers==${DIFFUSERS_VERSION} \ | ||
| compel \ | ||
| controlnet-aux \ | ||
| huggingface_hub==${HUGGINGFACE_HUB_VERSION} \ | ||
| hf_transfer \ | ||
| datasets==${DATASETS_VERSION} \ | ||
| optimum-neuron==${OPTIMUM_NEURON_VERSION} \ | ||
| sentence_transformers==${SENTENCE_TRANSFORMERS} \ | ||
| peft==${PEFT_VERSION} \ | ||
| && rm -rf ~/.cache/pip/* | ||
|
|
||
| FROM neuron AS grpc | ||
|
|
||
| COPY --from=grpc-builder /usr/src/target/release/text-embeddings-router /usr/local/bin/text-embeddings-router | ||
|
|
||
| ENTRYPOINT ["text-embeddings-router"] | ||
| CMD ["--json-output"] | ||
|
|
||
| FROM neuron AS http | ||
|
|
||
| COPY --from=http-builder /usr/src/target/release/text-embeddings-router /usr/local/bin/text-embeddings-router | ||
|
|
||
| ENTRYPOINT ["text-embeddings-router"] | ||
| CMD ["--json-output"] | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
14 changes: 14 additions & 0 deletions
14
backends/python/server/text_embeddings_server/models/habana/__init__.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| import os | ||
|
|
||
| DISABLE_TENSOR_CACHE = os.getenv("DISABLE_TENSOR_CACHE", "false").lower() in ["true", "1"] | ||
|
|
||
|
|
||
| def wrap_model_if_hpu(model_handle, device): | ||
| """Wrap the model in HPU graph if the device is HPU.""" | ||
| if device.type == "hpu": | ||
| from habana_frameworks.torch.hpu import wrap_in_hpu_graph | ||
|
|
||
| model_handle.model = wrap_in_hpu_graph( | ||
| model_handle.model, disable_tensor_cache=DISABLE_TENSOR_CACHE | ||
| ) | ||
| return model_handle |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not something to tackle in this PR maybe, but I'd rather rely on a
lockfile here instead of those, so it might be worth consider re-opening #587?cc @regisss and @kaixuanliu as this was something mentioned in the past, but apparently it was failing on Intel HPUs (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should work on HPU, not sure why it failed at that time. so don't hesitate to go that way, and if you have a lock file you would like me to test on HPU, happy to do it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @regisss, I'll restart Nico's PR to add
uvsupport instead, and ping you when done for testing 🤗