Skip to content

Commit 2d072a6

Browse files
committed
add refactor of example model input
1 parent 4eee97d commit 2d072a6

File tree

39 files changed

+206
-87
lines changed

39 files changed

+206
-87
lines changed

11-embeddings-reranker-classification-tensorrt/BEI-allenai-llama-3.1-tulu-3-8b-reward-model-fp8/README.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -102,8 +102,10 @@ environment_variables: {}
102102
external_package_dirs: []
103103
model_metadata:
104104
example_model_input:
105-
input: 'ERROR: This redirects to the embedding endpoint. Use the /sync API to
106-
reach /sync/predict'
105+
inputs: Baseten is a fast inference provider
106+
raw_scores: true
107+
truncate: true
108+
truncation_direction: Right
107109
model_name: BEI-allenai-llama-3.1-tulu-3-8b-reward-model-fp8-truss-example
108110
python_version: py39
109111
requirements: []
@@ -122,9 +124,10 @@ trt_llm:
122124
revision: main
123125
source: HF
124126
max_num_tokens: 131072
125-
max_seq_len: 1000001
126127
num_builder_gpus: 1
127128
quantization_type: fp8
129+
runtime:
130+
webserver_default_route: /predict
128131

129132
```
130133

11-embeddings-reranker-classification-tensorrt/BEI-allenai-llama-3.1-tulu-3-8b-reward-model-fp8/config.yaml

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,10 @@ environment_variables: {}
33
external_package_dirs: []
44
model_metadata:
55
example_model_input:
6-
input: 'ERROR: This redirects to the embedding endpoint. Use the /sync API to
7-
reach /sync/predict'
6+
inputs: Baseten is a fast inference provider
7+
raw_scores: true
8+
truncate: true
9+
truncation_direction: Right
810
model_name: BEI-allenai-llama-3.1-tulu-3-8b-reward-model-fp8-truss-example
911
python_version: py39
1012
requirements: []
@@ -23,6 +25,7 @@ trt_llm:
2325
revision: main
2426
source: HF
2527
max_num_tokens: 131072
26-
max_seq_len: 1000001
2728
num_builder_gpus: 1
2829
quantization_type: fp8
30+
runtime:
31+
webserver_default_route: /predict

11-embeddings-reranker-classification-tensorrt/BEI-baai-bge-en-icl-embedding-fp8/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -149,9 +149,10 @@ trt_llm:
149149
revision: main
150150
source: HF
151151
max_num_tokens: 32768
152-
max_seq_len: 1000001
153152
num_builder_gpus: 2
154153
quantization_type: fp8
154+
runtime:
155+
webserver_default_route: /v1/embeddings
155156

156157
```
157158

11-embeddings-reranker-classification-tensorrt/BEI-baai-bge-en-icl-embedding-fp8/config.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ trt_llm:
2424
revision: main
2525
source: HF
2626
max_num_tokens: 32768
27-
max_seq_len: 1000001
2827
num_builder_gpus: 2
2928
quantization_type: fp8
29+
runtime:
30+
webserver_default_route: /v1/embeddings

11-embeddings-reranker-classification-tensorrt/BEI-baai-bge-large-en-v1.5-embedding/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,8 @@ trt_llm:
148148
revision: main
149149
source: HF
150150
max_num_tokens: 16384
151-
max_seq_len: 1000001
151+
runtime:
152+
webserver_default_route: /v1/embeddings
152153

153154
```
154155

11-embeddings-reranker-classification-tensorrt/BEI-baai-bge-large-en-v1.5-embedding/config.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,4 +24,5 @@ trt_llm:
2424
revision: main
2525
source: HF
2626
max_num_tokens: 16384
27-
max_seq_len: 1000001
27+
runtime:
28+
webserver_default_route: /v1/embeddings

11-embeddings-reranker-classification-tensorrt/BEI-baai-bge-m3-embedding-dense/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,8 @@ trt_llm:
148148
revision: main
149149
source: HF
150150
max_num_tokens: 16384
151-
max_seq_len: 1000001
151+
runtime:
152+
webserver_default_route: /v1/embeddings
152153

153154
```
154155

11-embeddings-reranker-classification-tensorrt/BEI-baai-bge-m3-embedding-dense/config.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,4 +24,5 @@ trt_llm:
2424
revision: main
2525
source: HF
2626
max_num_tokens: 16384
27-
max_seq_len: 1000001
27+
runtime:
28+
webserver_default_route: /v1/embeddings

11-embeddings-reranker-classification-tensorrt/BEI-baai-bge-multilingual-gemma2-multilingual-embedding/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,8 @@ trt_llm:
148148
revision: main
149149
source: HF
150150
max_num_tokens: 16384
151-
max_seq_len: 1000001
151+
runtime:
152+
webserver_default_route: /v1/embeddings
152153

153154
```
154155

11-embeddings-reranker-classification-tensorrt/BEI-baai-bge-multilingual-gemma2-multilingual-embedding/config.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,4 +24,5 @@ trt_llm:
2424
revision: main
2525
source: HF
2626
max_num_tokens: 16384
27-
max_seq_len: 1000001
27+
runtime:
28+
webserver_default_route: /v1/embeddings

0 commit comments

Comments
 (0)