@@ -7,44 +7,16 @@ Feature: Results
77 And a model file tinyllamas/split/stories15M-00001-of-00003.gguf from HF repo ggml-org/models
88 And a model file test-model-00001-of-00003.gguf
99 And 128 as batch size
10- And 256 KV cache size
10+ And 1024 KV cache size
1111 And 128 max tokens to predict
12+ And continuous batching
1213
13- Scenario Outline : Multi users completion
14+ Scenario Outline : consistent results with same seed
1415 Given <n_slots> slots
15- And continuous batching
1616 Then the server is starting
1717 Then the server is healthy
1818
19- Given 42 as seed
20- And a prompt:
21- """
22- Write a very long story about AI.
23- """
24-
25- Given 42 as seed
26- And a prompt:
27- """
28- Write a very long story about AI.
29- """
30-
31- Given 42 as seed
32- And a prompt:
33- """
34- Write a very long story about AI.
35- """
36-
37- Given 42 as seed
38- And a prompt:
39- """
40- Write a very long story about AI.
41- """
42-
43- Given 42 as seed
44- And a prompt:
45- """
46- Write a very long story about AI.
47- """
19+ Given 4 prompts "Title: Little Red Riding Hood But In Space\n\n Summary:" with seed 42
4820
4921 Given concurrent completion requests
5022 Then the server is busy
@@ -55,3 +27,55 @@ Feature: Results
5527 | n_slots |
5628 | 1 |
5729 | 2 |
30+
31+ Scenario Outline : different results with different seed
32+ Given <n_slots> slots
33+ Then the server is starting
34+ Then the server is healthy
35+
36+ Given 1 prompts "Title: Little Red Riding Hood But In Space\n\n Summary:" with seed 42
37+ Given 1 prompts "Title: Little Red Riding Hood But In Space\n\n Summary:" with seed 43
38+ Given 1 prompts "Title: Little Red Riding Hood But In Space\n\n Summary:" with seed 44
39+ Given 1 prompts "Title: Little Red Riding Hood But In Space\n\n Summary:" with seed 45
40+
41+ Given concurrent completion requests
42+ Then the server is busy
43+ Then the server is idle
44+ And all slots are idle
45+ Then all predictions are different
46+ Examples :
47+ | n_slots |
48+ | 1 |
49+ | 2 |
50+
51+ Scenario Outline : consistent results with same seed and varying batch size
52+ Given 4 slots
53+ And <temp> temperature
54+ # And 0 as draft
55+ Then the server is starting
56+ Then the server is healthy
57+
58+ Given 1 prompts "Write a very long story about AI." with seed 42
59+ And concurrent completion requests
60+ # Then the server is busy # Not all slots will be utilized.
61+ Then the server is idle
62+ And all slots are idle
63+
64+ Given <n_parallel> prompts "Write a very long story about AI." with seed 42
65+ And concurrent completion requests
66+ # Then the server is busy # Not all slots will be utilized.
67+ Then the server is idle
68+ And all slots are idle
69+
70+ Then all predictions are equal
71+ Examples :
72+ | n_parallel | temp |
73+ | 1 | 0 .0 |
74+ | 2 | 0 .0 |
75+ | 4 | 0 .0 |
76+ | 1 | 1 .0 |
77+ # FIXME: These tests fail on master. The problem seems to be the unified KV cache.
78+ # See https://github.com/ggerganov/whisper.cpp/issues/1941#issuecomment-1986923227
79+ # and https://github.com/ggerganov/llama.cpp/pull/6122#discussion_r1531405574 .
80+ # | 2 | 1.0 |
81+ # | 4 | 1.0 |
0 commit comments