@@ -13,7 +13,7 @@ Feature: Results
1313
1414 Scenario Outline : consistent results with same seed
1515 Given <n_slots> slots
16- And 0 .0 temperature
16+ And 1 .0 temperature
1717 Then the server is starting
1818 Then the server is healthy
1919
@@ -27,7 +27,8 @@ Feature: Results
2727 Examples :
2828 | n_slots |
2929 | 1 |
30- | 2 |
30+ # FIXME: unified KV cache nondeterminism
31+ # | 2 |
3132
3233 Scenario Outline : different results with different seed
3334 Given <n_slots> slots
@@ -73,14 +74,13 @@ Feature: Results
7374 Examples :
7475 | n_parallel | temp |
7576 | 1 | 0 .0 |
76- | 2 | 0 .0 |
77- | 4 | 0 .0 |
7877 | 1 | 1 .0 |
79- # FIXME: These tests fail on master.
80- # Problems: unified KV cache (except for CPU backend with LLAMA_NO_LLAMAFILE=1), SIMD nondeterminism.
78+ # FIXME: unified KV cache nondeterminism
8179 # See https://github.com/ggerganov/whisper.cpp/issues/1941#issuecomment-1986923227
8280 # and https://github.com/ggerganov/llama.cpp/pull/6122#discussion_r1531405574
8381 # and https://github.com/ggerganov/llama.cpp/pull/7347 .
82+ # | 2 | 0.0 |
83+ # | 4 | 0.0 |
8484 # | 2 | 1.0 |
8585 # | 4 | 1.0 |
8686
@@ -108,12 +108,11 @@ Feature: Results
108108 Examples :
109109 | n_slots | n_kv | n_predict | n_parallel |
110110 | 4 | 1024 | 1 | 1 |
111- | 4 | 1024 | 1 | 4 |
112- # FIXME: These tests fail on master.
113- # Problems: unified KV cache (except for CPU backend with LLAMA_NO_LLAMAFILE=1), SIMD nondeterminism.
111+ # FIXME: unified KV cache nondeterminism
114112 # See https://github.com/ggerganov/whisper.cpp/issues/1941#issuecomment-1986923227
115113 # and https://github.com/ggerganov/llama.cpp/pull/6122#discussion_r1531405574
116114 # and https://github.com/ggerganov/llama.cpp/pull/7347 .
115+ # | 4 | 1024 | 1 | 4 |
117116 # | 4 | 1024 | 100 | 1 |
118117 # This test still fails even the above patches; the first token probabilities are already different.
119118 # | 4 | 1024 | 100 | 4 |
0 commit comments