Skip to content

Commit 396085f

Browse files
committed
add max context length
1 parent f302ecb commit 396085f

File tree

7 files changed

+91
-35
lines changed
  • 11-embeddings-reranker-classification-tensorrt
    • BEI-allenai-llama-3.1-tulu-3-8b-reward-model-fp8
    • BEI-baseten-example-meta-llama-3-70b-instructforsequenceclassification-fp8
    • BEI-mixedbread-ai-mxbai-rerank-base-v2-reranker-fp8
    • BEI-mixedbread-ai-mxbai-rerank-large-v2-reranker-fp8
    • BEI-papluca-xlm-roberta-base-language-detection-classification
    • BEI-samlowe-roberta-base-go_emotions-classification
    • BEI-skywork-skywork-reward-llama-3.1-8b-v0.2-reward-model-fp8

7 files changed

+91
-35
lines changed

11-embeddings-reranker-classification-tensorrt/BEI-allenai-llama-3.1-tulu-3-8b-reward-model-fp8/README.md

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ requests.post(
6464
headers=headers,
6565
url="https://model-xxxxxx.api.baseten.co/environments/production/sync/predict",
6666
json={
67-
"inputs": "Baseten is a fast inference provider",
67+
"inputs": [["Baseten is a fast inference provider", ["classify this separately."]],
6868
"raw_scores": True,
6969
"truncate": True,
7070
"truncation_direction": "Right"
@@ -74,10 +74,18 @@ requests.post(
7474
Returns:
7575
```json
7676
[
77-
{
78-
"label": "excitement",
79-
"score": 0.99
80-
}
77+
[
78+
{
79+
"label": "excitement",
80+
"score": 0.99
81+
}
82+
],
83+
[
84+
{
85+
"label": "excitement",
86+
"score": 0.01
87+
}
88+
]
8189
]
8290
```
8391
Important, this is different from the `predict` route that you usually call. (https://model-xxxxxx.api.baseten.co/environments/production/predict), it contains an additional `sync` before that.

11-embeddings-reranker-classification-tensorrt/BEI-baseten-example-meta-llama-3-70b-instructforsequenceclassification-fp8/README.md

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ requests.post(
6464
headers=headers,
6565
url="https://model-xxxxxx.api.baseten.co/environments/production/sync/predict",
6666
json={
67-
"inputs": "Baseten is a fast inference provider",
67+
"inputs": [["Baseten is a fast inference provider", ["classify this separately."]],
6868
"raw_scores": True,
6969
"truncate": True,
7070
"truncation_direction": "Right"
@@ -74,10 +74,18 @@ requests.post(
7474
Returns:
7575
```json
7676
[
77-
{
78-
"label": "excitement",
79-
"score": 0.99
80-
}
77+
[
78+
{
79+
"label": "excitement",
80+
"score": 0.99
81+
}
82+
],
83+
[
84+
{
85+
"label": "excitement",
86+
"score": 0.01
87+
}
88+
]
8189
]
8290
```
8391
Important, this is different from the `predict` route that you usually call. (https://model-xxxxxx.api.baseten.co/environments/production/predict), it contains an additional `sync` before that.

11-embeddings-reranker-classification-tensorrt/BEI-mixedbread-ai-mxbai-rerank-base-v2-reranker-fp8/README.md

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ requests.post(
6464
headers=headers,
6565
url="https://model-xxxxxx.api.baseten.co/environments/production/sync/predict",
6666
json={
67-
"inputs": "Baseten is a fast inference provider",
67+
"inputs": [["Baseten is a fast inference provider", ["classify this separately."]],
6868
"raw_scores": True,
6969
"truncate": True,
7070
"truncation_direction": "Right"
@@ -74,10 +74,18 @@ requests.post(
7474
Returns:
7575
```json
7676
[
77-
{
78-
"label": "excitement",
79-
"score": 0.99
80-
}
77+
[
78+
{
79+
"label": "excitement",
80+
"score": 0.99
81+
}
82+
],
83+
[
84+
{
85+
"label": "excitement",
86+
"score": 0.01
87+
}
88+
]
8189
]
8290
```
8391
Important, this is different from the `predict` route that you usually call. (https://model-xxxxxx.api.baseten.co/environments/production/predict), it contains an additional `sync` before that.

11-embeddings-reranker-classification-tensorrt/BEI-mixedbread-ai-mxbai-rerank-large-v2-reranker-fp8/README.md

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ requests.post(
6464
headers=headers,
6565
url="https://model-xxxxxx.api.baseten.co/environments/production/sync/predict",
6666
json={
67-
"inputs": "Baseten is a fast inference provider",
67+
"inputs": [["Baseten is a fast inference provider", ["classify this separately."]],
6868
"raw_scores": True,
6969
"truncate": True,
7070
"truncation_direction": "Right"
@@ -74,10 +74,18 @@ requests.post(
7474
Returns:
7575
```json
7676
[
77-
{
78-
"label": "excitement",
79-
"score": 0.99
80-
}
77+
[
78+
{
79+
"label": "excitement",
80+
"score": 0.99
81+
}
82+
],
83+
[
84+
{
85+
"label": "excitement",
86+
"score": 0.01
87+
}
88+
]
8189
]
8290
```
8391
Important, this is different from the `predict` route that you usually call. (https://model-xxxxxx.api.baseten.co/environments/production/predict), it contains an additional `sync` before that.

11-embeddings-reranker-classification-tensorrt/BEI-papluca-xlm-roberta-base-language-detection-classification/README.md

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ requests.post(
6363
headers=headers,
6464
url="https://model-xxxxxx.api.baseten.co/environments/production/sync/predict",
6565
json={
66-
"inputs": "Baseten is a fast inference provider",
66+
"inputs": [["Baseten is a fast inference provider", ["classify this separately."]],
6767
"raw_scores": True,
6868
"truncate": True,
6969
"truncation_direction": "Right"
@@ -73,10 +73,18 @@ requests.post(
7373
Returns:
7474
```json
7575
[
76-
{
77-
"label": "excitement",
78-
"score": 0.99
79-
}
76+
[
77+
{
78+
"label": "excitement",
79+
"score": 0.99
80+
}
81+
],
82+
[
83+
{
84+
"label": "excitement",
85+
"score": 0.01
86+
}
87+
]
8088
]
8189
```
8290
Important, this is different from the `predict` route that you usually call. (https://model-xxxxxx.api.baseten.co/environments/production/predict), it contains an additional `sync` before that.

11-embeddings-reranker-classification-tensorrt/BEI-samlowe-roberta-base-go_emotions-classification/README.md

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ requests.post(
6363
headers=headers,
6464
url="https://model-xxxxxx.api.baseten.co/environments/production/sync/predict",
6565
json={
66-
"inputs": "Baseten is a fast inference provider",
66+
"inputs": [["Baseten is a fast inference provider", ["classify this separately."]],
6767
"raw_scores": True,
6868
"truncate": True,
6969
"truncation_direction": "Right"
@@ -73,10 +73,18 @@ requests.post(
7373
Returns:
7474
```json
7575
[
76-
{
77-
"label": "excitement",
78-
"score": 0.99
79-
}
76+
[
77+
{
78+
"label": "excitement",
79+
"score": 0.99
80+
}
81+
],
82+
[
83+
{
84+
"label": "excitement",
85+
"score": 0.01
86+
}
87+
]
8088
]
8189
```
8290
Important, this is different from the `predict` route that you usually call. (https://model-xxxxxx.api.baseten.co/environments/production/predict), it contains an additional `sync` before that.

11-embeddings-reranker-classification-tensorrt/BEI-skywork-skywork-reward-llama-3.1-8b-v0.2-reward-model-fp8/README.md

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ requests.post(
6464
headers=headers,
6565
url="https://model-xxxxxx.api.baseten.co/environments/production/sync/predict",
6666
json={
67-
"inputs": "Baseten is a fast inference provider",
67+
"inputs": [["Baseten is a fast inference provider", ["classify this separately."]],
6868
"raw_scores": True,
6969
"truncate": True,
7070
"truncation_direction": "Right"
@@ -74,10 +74,18 @@ requests.post(
7474
Returns:
7575
```json
7676
[
77-
{
78-
"label": "excitement",
79-
"score": 0.99
80-
}
77+
[
78+
{
79+
"label": "excitement",
80+
"score": 0.99
81+
}
82+
],
83+
[
84+
{
85+
"label": "excitement",
86+
"score": 0.01
87+
}
88+
]
8189
]
8290
```
8391
Important, this is different from the `predict` route that you usually call. (https://model-xxxxxx.api.baseten.co/environments/production/predict), it contains an additional `sync` before that.

0 commit comments

Comments
 (0)