Skip to content

Commit 67ac777

Browse files
committed
update
1 parent 28d0018 commit 67ac777

File tree

4 files changed

+28
-38
lines changed

4 files changed

+28
-38
lines changed

README.md

Lines changed: 22 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,18 @@
1-
21
<div style="display:flex; text-align:center; justify-content:center;">
32
<img src="https://huggingface.co/front/assets/huggingface_logo.svg" width="100"/>
43
<h1 style="margin-top:auto;"> Hugging Face Inference Toolkit <h1>
54
</div>
65

7-
Hugging Face Inference Toolkit is for serving 🤗 Transformers models in containers. This library provides default pre-processing, predict and postprocessing for Transformers, Sentence Tranfsformers. It is also possible to define custom `handler.py` for customization. The Toolkit is build to work with the [Hugging Face Hub](https://huggingface.co/models).
8-
9-
---
6+
Hugging Face Inference Toolkit is for serving 🤗 Transformers models in containers. This library provides default pre-processing, predict and postprocessing for Transformers, Sentence Tranfsformers. It is also possible to define custom `handler.py` for customization. The Toolkit is build to work with the [Hugging Face Hub](https://huggingface.co/models) and is used as "default" option in [Inference Endpoints](https://ui.endpoints.huggingface.co/)
107

11-
## 💻 Getting Started with Hugging Face Inference Toolkit
8+
## 💻 Getting Started with Hugging Face Inference Toolkit
129

13-
* Clone the repository `git clone <https://github.com/huggingface/huggingface-inference-toolkit``>
14-
* Install the dependencies in dev mode `pip install -e ".[torch,st,diffusers,test,quality]"`
15-
* If you develop on AWS inferentia2 install with `pip install -e ".[test,quality]" optimum-neuron[neuronx] --upgrade`
16-
* If you develop on Google Cloud install with `pip install -e ".[torch,st,diffusers,google,test,quality]"`
17-
* Unit Testing: `make unit-test`
18-
* Integration testing: `make integ-test`
10+
- Clone the repository `git clone https://github.com/huggingface/huggingface-inference-toolkit`
11+
- Install the dependencies in dev mode `pip install -e ".[torch,st,diffusers,test,quality]"`
12+
- If you develop on AWS inferentia2 install with `pip install -e ".[test,quality]" optimum-neuron[neuronx] --upgrade`
13+
- If you develop on Google Cloud install with `pip install -e ".[torch,st,diffusers,google,test,quality]"`
14+
- Unit Testing: `make unit-test`
15+
- Integration testing: `make integ-test`
1916

2017
### Local run
2118

@@ -68,18 +65,18 @@ curl --request POST \
6865

6966
The Hugging Face Inference Toolkit allows user to provide a custom inference through a `handler.py` file which is located in the repository.
7067

71-
For an example check [philschmid/custom-pipeline-text-classification](https://huggingface.co/philschmid/custom-pipeline-text-classification):
68+
For an example check [philschmid/custom-pipeline-text-classification](https://huggingface.co/philschmid/custom-pipeline-text-classification):
7269

7370
```bash
7471
model.tar.gz/
7572
|- pytorch_model.bin
7673
|- ....
7774
|- handler.py
78-
|- requirements.txt
75+
|- requirements.txt
7976
```
8077

8178
In this example, `pytroch_model.bin` is the model file saved from training, `handler.py` is the custom inference handler, and `requirements.txt` is a requirements file to add additional dependencies.
82-
The custom module can override the following methods:
79+
The custom module can override the following methods:
8380

8481
### Vertex AI Support
8582

@@ -136,9 +133,9 @@ curl --request POST \
136133

137134
The Hugging Face Inference Toolkit provides support for deploying Hugging Face on AWS Inferentia2. To deploy a model on Inferentia2 you have 3 options:
138135

139-
* Provide `HF_MODEL_ID`, the model repo id on huggingface.co which contains the compiled model under `.neuron` format e.g. `optimum/bge-base-en-v1.5-neuronx`
140-
* Provide the `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH` environment variables to compile the model on the fly, e.g. `HF_OPTIMUM_BATCH_SIZE=1 HF_OPTIMUM_SEQUENCE_LENGTH=128`
141-
* Include `neuron` dictionary in the [config.json](https://huggingface.co/optimum/tiny_random_bert_neuron/blob/main/config.json) file in the model archive, e.g. `neuron: {"static_batch_size": 1, "static_sequence_length": 128}`
136+
- Provide `HF_MODEL_ID`, the model repo id on huggingface.co which contains the compiled model under `.neuron` format e.g. `optimum/bge-base-en-v1.5-neuronx`
137+
- Provide the `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH` environment variables to compile the model on the fly, e.g. `HF_OPTIMUM_BATCH_SIZE=1 HF_OPTIMUM_SEQUENCE_LENGTH=128`
138+
- Include `neuron` dictionary in the [config.json](https://huggingface.co/optimum/tiny_random_bert_neuron/blob/main/config.json) file in the model archive, e.g. `neuron: {"static_batch_size": 1, "static_sequence_length": 128}`
142139

143140
The currently supported tasks can be found [here](https://huggingface.co/docs/optimum-neuron/en/package_reference/supported_models). If you plan to deploy an LLM, we recommend taking a look at [Neuronx TGI](https://huggingface.co/blog/text-generation-inference-on-inferentia2), which is purposly build for LLMs.
144141

@@ -148,14 +145,14 @@ Start Hugging Face Inference Toolkit with the following environment variables.
148145

149146
_Note: You need to run this on an Inferentia2 instance._
150147

151-
* transformers `text-classification` with `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH`
148+
- transformers `text-classification` with `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH`
152149

153150
```bash
154151
mkdir tmp2/
155152
HF_MODEL_ID="distilbert/distilbert-base-uncased-finetuned-sst-2-english" HF_TASK="text-classification" HF_OPTIMUM_BATCH_SIZE=1 HF_OPTIMUM_SEQUENCE_LENGTH=128 HF_MODEL_DIR=tmp2 uvicorn src.huggingface_inference_toolkit.webservice_starlette:app --port 5000
156153
```
157154

158-
* sentence transformers `feature-extration` with `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH`
155+
- sentence transformers `feature-extration` with `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH`
159156

160157
```bash
161158
HF_MODEL_ID="sentence-transformers/all-MiniLM-L6-v2" HF_TASK="feature-extraction" HF_OPTIMUM_BATCH_SIZE=1 HF_OPTIMUM_SEQUENCE_LENGTH=128 HF_MODEL_DIR=tmp2 uvicorn src.huggingface_inference_toolkit.webservice_starlette:app --port 5000
@@ -284,19 +281,13 @@ HF_OPTIMUM_SEQUENCE_LENGTH="128"
284281

285282
## ⚙ Supported Front-Ends
286283

287-
* [x] Starlette (HF Endpoints)
288-
* [x] Starlette (Vertex AI)
289-
* [ ] Starlette (Azure ML)
290-
* [ ] Starlette (SageMaker)
291-
292-
---
293-
294-
## 🤝 Contributing
295-
296-
---
284+
- [x] Starlette (HF Endpoints)
285+
- [x] Starlette (Vertex AI)
286+
- [ ] Starlette (Azure ML)
287+
- [ ] Starlette (SageMaker)
297288

298-
## 📜 License
289+
## 📜 License
299290

300-
TBD.
291+
This project is licensed under the Apache-2.0 License.
301292

302293
---

pyproject.toml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,15 @@ no_implicit_optional = true
44
scripts_are_modules = true
55

66
[tool.ruff]
7-
lint.select = [
7+
select = [
88
"E", # pycodestyle errors
99
"W", # pycodestyle warnings
1010
"F", # pyflakes
1111
"I", # isort
1212
"C", # flake8-comprehensions
1313
"B", # flake8-bugbear
1414
]
15-
lint.ignore = [
15+
ignore = [
1616
"E501", # Line length (handled by ruff-format)
1717
"B008", # do not perform function calls in argument defaults
1818
"C901", # too complex
@@ -21,13 +21,13 @@ lint.ignore = [
2121
line-length = 119
2222

2323
# Allow unused variables when underscore-prefixed.
24-
lint.dummy-variable-rgx = "^(_+|(_+[a-zA-Z0-9_]*[a-zA-Z0-9]+?))$"
24+
dummy-variable-rgx = "^(_+|(_+[a-zA-Z0-9_]*[a-zA-Z0-9]+?))$"
2525

2626
# Assume Python 3.11.
2727
target-version = "py311"
2828

29-
lint.per-file-ignores = {"__init__.py" = ["F401"]}
29+
per-file-ignores = { "__init__.py" = ["F401"] }
3030

3131
[tool.isort]
3232
profile = "black"
33-
known_third_party = ["transformers", "starlette", "huggingface_hub"]
33+
known_third_party = ["transformers", "starlette", "huggingface_hub"]

setup.cfg

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@ known_third_party =
1010
datasets
1111
tensorflow
1212
torch
13-
robyn
1413

1514
line_length = 119
1615
lines_after_imports = 2

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
# We don't declare our dependency on transformers here because we build with
66
# different packages for different variants
77

8-
VERSION = "0.4.3"
8+
VERSION = "0.5.0"
99

1010
# Ubuntu packages
1111
# libsndfile1-dev: torchaudio requires the development version of the libsndfile package which can be installed via a system package manager. On Ubuntu it can be installed as follows: apt install libsndfile1-dev

0 commit comments

Comments
 (0)