Skip to content

Commit d4041c5

Browse files
committed
Use shared thread pool for tokenization (#396)
To avoid CPU-intensive tokenization on async event loop. Determine thread pool size based on number of CPU cores and shard processes. Also validate stop sequence lengths based on number of bytes rather than number of tokens (the latter doesn't make sense since we don't do token-based matching). And add a couple of integration tests.
1 parent 7c4745e commit d4041c5

File tree

12 files changed

+351
-432
lines changed

12 files changed

+351
-432
lines changed

Cargo.lock

Lines changed: 37 additions & 127 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

integration_tests/test_cases_bloom560m.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1353,7 +1353,7 @@
13531353
- {"text": "A very long story:\n"}
13541354
error:
13551355
code: INVALID_ARGUMENT
1356-
message: can specify at most 6 non-empty stop sequences, each not more than 40 tokens
1356+
message: can specify at most 6 non-empty stop sequences, each not more than 240 UTF8 bytes
13571357

13581358
# Error case 2
13591359
- name: Input length + token min too long

integration_tests/test_cases_mt0small.yaml

Lines changed: 42 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,47 @@
99
response: {}
1010

1111

12+
# Tokenize count only
13+
- name: Tokenize count only
14+
request_type: tokenize
15+
request:
16+
return_tokens: false
17+
requests:
18+
- {"text": "The very long story is written by a very long story"}
19+
response:
20+
responses:
21+
- tokenCount: 16
22+
23+
24+
# Tokenize with tokens
25+
- name: Tokenize with tokens
26+
request_type: tokenize
27+
request:
28+
return_tokens: true
29+
requests:
30+
- {"text": "The very long story is written by a very long story"}
31+
response:
32+
responses:
33+
- tokenCount: 16
34+
tokens:
35+
- "\u2581The"
36+
- "\u2581"
37+
- very
38+
- "\u2581long"
39+
- "\u2581story"
40+
- "\u2581is"
41+
- "\u2581"
42+
- written
43+
- "\u2581by"
44+
- "\u2581"
45+
- a
46+
- "\u2581"
47+
- very
48+
- "\u2581long"
49+
- "\u2581story"
50+
- </s>
51+
52+
1253
# Basic Greedy (implicit)
1354
- name: Basic Greedy, max new tokens (implicit)
1455
request:
@@ -55,8 +96,6 @@
5596
text: Wonderful day.
5697

5798

58-
59-
6099
# Prompt prefix - encoder only
61100
- name: Greedy with tuned prompt prefix for encoder only
62101
request:
@@ -1182,7 +1221,7 @@
11821221
- {"text": "A very long story:\n"}
11831222
error:
11841223
code: INVALID_ARGUMENT
1185-
message: can specify at most 6 non-empty stop sequences, each not more than 40 tokens
1224+
message: can specify at most 6 non-empty stop sequences, each not more than 240 UTF8 bytes
11861225

11871226

11881227
# Test input tokens boundary

0 commit comments

Comments
 (0)