You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* server: tests: add models endpoint scenario
* server: /v1/models add some metadata
* server: tests: add debug field in context before scenario
* server: tests: download model from HF, add batch size
* server: tests: add passkey test
* server: tests: add group attention params
* server: do not truncate prompt tokens if self-extend through group attention is enabled
* server: logs: do not truncate log values
* server: tests - passkey - first good working value of nga
* server: tests: fix server timeout
* server: tests: fix passkey, add doc, fix regex content matching, fix timeout
* server: tests: fix regex content matching
* server: tests: schedule slow tests on master
* server: metrics: fix when no prompt processed
* server: tests: self-extend add llama-2-7B and Mixtral-8x7B-v0.1
* server: tests: increase timeout for completion
* server: tests: keep only the PHI-2 test
* server: tests: passkey add a negative test
Python based server tests scenario using [BDD](https://en.wikipedia.org/wiki/Behavior-driven_development) and [behave](https://behave.readthedocs.io/en/latest/):
*[parallel.feature](./features/parallel.feature) Scenario involving multi slots and concurrent requests
8
+
*[security.feature](./features/security.feature) Security, CORS and API Key
9
+
*[server.feature](./features/server.feature) Server base scenario: completion, embedding, tokenization, etc...
8
10
9
11
Tests target GitHub workflows job runners with 4 vCPU.
10
12
11
-
Requests are using [aiohttp](https://docs.aiohttp.org/en/stable/client_reference.html), [asyncio](https://docs.python.org/fr/3/library/asyncio.html) based http client.
13
+
Requests are
14
+
using [aiohttp](https://docs.aiohttp.org/en/stable/client_reference.html), [asyncio](https://docs.python.org/fr/3/library/asyncio.html)
15
+
based http client.
12
16
13
-
Note: If the host architecture inference speed is faster than GitHub runners one, parallel scenario may randomly fail. To mitigate it, you can increase values in `n_predict`, `kv_size`.
17
+
Note: If the host architecture inference speed is faster than GitHub runners one, parallel scenario may randomly fail.
18
+
To mitigate it, you can increase values in `n_predict`, `kv_size`.
0 commit comments