Skip to content

Commit fc07ffb

Browse files
committed
add qwen-qwq correction + add mixedbread.ai classification model
1 parent e1745a5 commit fc07ffb

File tree

52 files changed

+694
-120
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+694
-120
lines changed

11-embeddings-reranker-classification-tensorrt/BEI-allenai-llama-3.1-tulu-3-8b-reward-model-fp8/README.md

Lines changed: 25 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -46,12 +46,31 @@ POST-Route: `https://model-xxxxxx.api.baseten.co/environments/production/sync/pr
4646
```json
4747
{
4848
"inputs": "Baseten is a fast inference provider",
49-
"raw_scores": false,
50-
"truncate": false,
51-
"truncation_direction": "right"
49+
"raw_scores": true,
50+
"truncate": true,
51+
"truncation_direction": "Right"
5252
}
5353
```
5454

55+
```python
56+
import requests
57+
import os
58+
59+
headers = {
60+
f"Authorization": f"Api-Key {os.environ['BASETEN_API_KEY']}"
61+
}
62+
63+
requests.post(
64+
headers=headers,
65+
url="https://model-xxxxxx.api.baseten.co/environments/production/sync/predict",
66+
json={
67+
"inputs": "Baseten is a fast inference provider",
68+
"raw_scores": True,
69+
"truncate": True,
70+
"truncation_direction": "Right"
71+
}
72+
)
73+
```
5574
Returns:
5675
```json
5776
[
@@ -61,7 +80,7 @@ Returns:
6180
}
6281
]
6382
```
64-
Important, this is different from the `predict` route: https://model-xxxxxx.api.baseten.co/environments/production/predict
83+
Important, this is different from the `predict` route that you usually call. (https://model-xxxxxx.api.baseten.co/environments/production/predict), it contains an additional `sync` before that.
6584
The OpenAPI.json is available under https://model-xxxxxx.api.baseten.co/environments/production/sync/openapi.json for more details.
6685

6786
#### Advanced:
@@ -83,8 +102,8 @@ environment_variables: {}
83102
external_package_dirs: []
84103
model_metadata:
85104
example_model_input:
86-
input: This redirects to the embedding endpoint. Use the /sync API to reach /sync/predict
87-
endpoint.
105+
input: 'ERROR: This redirects to the embedding endpoint. Use the /sync API to
106+
reach /sync/predict'
88107
model_name: BEI-allenai-llama-3.1-tulu-3-8b-reward-model-fp8-truss-example
89108
python_version: py39
90109
requirements: []

11-embeddings-reranker-classification-tensorrt/BEI-allenai-llama-3.1-tulu-3-8b-reward-model-fp8/config.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@ environment_variables: {}
33
external_package_dirs: []
44
model_metadata:
55
example_model_input:
6-
input: This redirects to the embedding endpoint. Use the /sync API to reach /sync/predict
7-
endpoint.
6+
input: 'ERROR: This redirects to the embedding endpoint. Use the /sync API to
7+
reach /sync/predict'
88
model_name: BEI-allenai-llama-3.1-tulu-3-8b-reward-model-fp8-truss-example
99
python_version: py39
1010
requirements: []

11-embeddings-reranker-classification-tensorrt/BEI-baai-bge-reranker-large/README.md

Lines changed: 32 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -44,17 +44,39 @@ truss push --publish
4444
POST-Route: `https://model-xxxxxx.api.baseten.co/environments/production/sync/rerank`:
4545
```json
4646
{
47-
"query": "What is Baseten?",
48-
"raw_scores": false,
49-
"return_text": false,
50-
"texts": [
51-
"Deep Learning is ...", "Baseten is a fast inference provider"
52-
],
53-
"truncate": false,
54-
"truncation_direction": "right"
47+
"query": "What is Baseten?",
48+
"raw_scores": true,
49+
"return_text": false,
50+
"texts": [
51+
"Deep Learning is ...", "Baseten is a fast inference provider"
52+
],
53+
"truncate": true,
54+
"truncation_direction": "Right"
5555
}
5656
```
5757

58+
```python
59+
import requests
60+
import os
61+
62+
headers = {
63+
f"Authorization": f"Api-Key {os.environ['BASETEN_API_KEY']}"
64+
}
65+
66+
requests.post(
67+
headers=headers,
68+
url="https://model-xxxxxx.api.baseten.co/environments/production/sync/rerank",
69+
json={
70+
"query": "What is Baseten?",
71+
"raw_scores": True,
72+
"return_text": False,
73+
"texts": [
74+
"Deep Learning is ...", "Baseten is a fast inference provider"
75+
],
76+
"truncate": True,
77+
"truncation_direction": "Right"
78+
}
79+
```
5880
Returns:
5981
```json
6082
[
@@ -86,7 +108,8 @@ environment_variables: {}
86108
external_package_dirs: []
87109
model_metadata:
88110
example_model_input:
89-
input: This redirects to the embedding endpoint. Use the /sync API to reach /rerank
111+
input: 'ERROR: This redirects to the embedding endpoint. Use the /sync API to
112+
reach /sync/rerank'
90113
model_name: BEI-baai-bge-reranker-large-truss-example
91114
python_version: py39
92115
requirements: []

11-embeddings-reranker-classification-tensorrt/BEI-baai-bge-reranker-large/config.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@ environment_variables: {}
33
external_package_dirs: []
44
model_metadata:
55
example_model_input:
6-
input: This redirects to the embedding endpoint. Use the /sync API to reach /rerank
6+
input: 'ERROR: This redirects to the embedding endpoint. Use the /sync API to
7+
reach /sync/rerank'
78
model_name: BEI-baai-bge-reranker-large-truss-example
89
python_version: py39
910
requirements: []

11-embeddings-reranker-classification-tensorrt/BEI-baai-bge-reranker-v2-m3-multilingual/README.md

Lines changed: 32 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -44,17 +44,39 @@ truss push --publish
4444
POST-Route: `https://model-xxxxxx.api.baseten.co/environments/production/sync/rerank`:
4545
```json
4646
{
47-
"query": "What is Baseten?",
48-
"raw_scores": false,
49-
"return_text": false,
50-
"texts": [
51-
"Deep Learning is ...", "Baseten is a fast inference provider"
52-
],
53-
"truncate": false,
54-
"truncation_direction": "right"
47+
"query": "What is Baseten?",
48+
"raw_scores": true,
49+
"return_text": false,
50+
"texts": [
51+
"Deep Learning is ...", "Baseten is a fast inference provider"
52+
],
53+
"truncate": true,
54+
"truncation_direction": "Right"
5555
}
5656
```
5757

58+
```python
59+
import requests
60+
import os
61+
62+
headers = {
63+
f"Authorization": f"Api-Key {os.environ['BASETEN_API_KEY']}"
64+
}
65+
66+
requests.post(
67+
headers=headers,
68+
url="https://model-xxxxxx.api.baseten.co/environments/production/sync/rerank",
69+
json={
70+
"query": "What is Baseten?",
71+
"raw_scores": True,
72+
"return_text": False,
73+
"texts": [
74+
"Deep Learning is ...", "Baseten is a fast inference provider"
75+
],
76+
"truncate": True,
77+
"truncation_direction": "Right"
78+
}
79+
```
5880
Returns:
5981
```json
6082
[
@@ -86,7 +108,8 @@ environment_variables: {}
86108
external_package_dirs: []
87109
model_metadata:
88110
example_model_input:
89-
input: This redirects to the embedding endpoint. Use the /sync API to reach /rerank
111+
input: 'ERROR: This redirects to the embedding endpoint. Use the /sync API to
112+
reach /sync/rerank'
90113
model_name: BEI-baai-bge-reranker-v2-m3-multilingual-truss-example
91114
python_version: py39
92115
requirements: []

11-embeddings-reranker-classification-tensorrt/BEI-baai-bge-reranker-v2-m3-multilingual/config.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@ environment_variables: {}
33
external_package_dirs: []
44
model_metadata:
55
example_model_input:
6-
input: This redirects to the embedding endpoint. Use the /sync API to reach /rerank
6+
input: 'ERROR: This redirects to the embedding endpoint. Use the /sync API to
7+
reach /sync/rerank'
78
model_name: BEI-baai-bge-reranker-v2-m3-multilingual-truss-example
89
python_version: py39
910
requirements: []

11-embeddings-reranker-classification-tensorrt/BEI-baseten-example-meta-llama-3-70b-instructforsequenceclassification-fp8/README.md

Lines changed: 25 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -46,12 +46,31 @@ POST-Route: `https://model-xxxxxx.api.baseten.co/environments/production/sync/pr
4646
```json
4747
{
4848
"inputs": "Baseten is a fast inference provider",
49-
"raw_scores": false,
50-
"truncate": false,
51-
"truncation_direction": "right"
49+
"raw_scores": true,
50+
"truncate": true,
51+
"truncation_direction": "Right"
5252
}
5353
```
5454

55+
```python
56+
import requests
57+
import os
58+
59+
headers = {
60+
f"Authorization": f"Api-Key {os.environ['BASETEN_API_KEY']}"
61+
}
62+
63+
requests.post(
64+
headers=headers,
65+
url="https://model-xxxxxx.api.baseten.co/environments/production/sync/predict",
66+
json={
67+
"inputs": "Baseten is a fast inference provider",
68+
"raw_scores": True,
69+
"truncate": True,
70+
"truncation_direction": "Right"
71+
}
72+
)
73+
```
5574
Returns:
5675
```json
5776
[
@@ -61,7 +80,7 @@ Returns:
6180
}
6281
]
6382
```
64-
Important, this is different from the `predict` route: https://model-xxxxxx.api.baseten.co/environments/production/predict
83+
Important, this is different from the `predict` route that you usually call. (https://model-xxxxxx.api.baseten.co/environments/production/predict), it contains an additional `sync` before that.
6584
The OpenAPI.json is available under https://model-xxxxxx.api.baseten.co/environments/production/sync/openapi.json for more details.
6685

6786
#### Advanced:
@@ -83,8 +102,8 @@ environment_variables: {}
83102
external_package_dirs: []
84103
model_metadata:
85104
example_model_input:
86-
input: This redirects to the embedding endpoint. Use the /sync API to reach /sync/predict
87-
endpoint.
105+
input: 'ERROR: This redirects to the embedding endpoint. Use the /sync API to
106+
reach /sync/predict'
88107
model_name: BEI-baseten-example-meta-llama-3-70b-instructforsequenceclassification-fp8-truss-example
89108
python_version: py39
90109
requirements: []

11-embeddings-reranker-classification-tensorrt/BEI-baseten-example-meta-llama-3-70b-instructforsequenceclassification-fp8/config.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@ environment_variables: {}
33
external_package_dirs: []
44
model_metadata:
55
example_model_input:
6-
input: This redirects to the embedding endpoint. Use the /sync API to reach /sync/predict
7-
endpoint.
6+
input: 'ERROR: This redirects to the embedding endpoint. Use the /sync API to
7+
reach /sync/predict'
88
model_name: BEI-baseten-example-meta-llama-3-70b-instructforsequenceclassification-fp8-truss-example
99
python_version: py39
1010
requirements: []

0 commit comments

Comments
 (0)