-
Notifications
You must be signed in to change notification settings - Fork 13.4k
server : replace behave with pytest #10416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 9 commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
3acaf58
server : replace behave with pytest
ngxson 49cdfd3
fix test on windows
ngxson d7de413
misc
ngxson 3249aab
add more tests
ngxson f09a9b6
more tests
ngxson e34c9d7
styling
ngxson eb02373
log less, fix embd test
ngxson 472e128
added all sequential tests
ngxson 6af3f95
fix coding style
ngxson 1c2f0f7
fix save slot test
ngxson 78e3cb3
add parallel completion test
ngxson c432a82
fix parallel test
ngxson 58cbcd2
remove feature files
ngxson 3a504ae
Merge branch 'master' into xsn/server_pytest
ngxson 71fc0f1
update test docs
ngxson 217c9e4
no cache_prompt for some tests
ngxson 52c2625
add test_cache_vs_nocache_prompt
ngxson File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -34,7 +34,7 @@ let | |
|
||
# server tests | ||
openai | ||
behave | ||
pytest | ||
prometheus-client | ||
]; | ||
in | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,2 @@ | ||
.venv | ||
tmp |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
import pytest | ||
from utils import * | ||
|
||
|
||
# ref: https://stackoverflow.com/questions/22627659/run-code-before-and-after-each-test-in-py-test | ||
@pytest.fixture(autouse=True) | ||
def stop_server_after_each_test(): | ||
# do nothing before each test | ||
yield | ||
# stop all servers after each test | ||
instances = set( | ||
server_instances | ||
) # copy the set to prevent 'Set changed size during iteration' | ||
for server in instances: | ||
server.stop() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,8 @@ | ||
#!/usr/bin/env python3 | ||
# -*- coding: utf-8 -*- | ||
|
||
# type: ignore | ||
|
||
import asyncio | ||
import json | ||
import os | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
aiohttp~=3.9.3 | ||
behave~=1.2.6 | ||
pytest~=8.3.3 | ||
huggingface_hub~=0.23.2 | ||
numpy~=1.26.4 | ||
openai~=1.30.3 | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
import pytest | ||
from utils import * | ||
|
||
server = ServerPreset.tinyllama2() | ||
|
||
|
||
@pytest.fixture(scope="module", autouse=True) | ||
def create_server(): | ||
global server | ||
server = ServerPreset.tinyllama2() | ||
|
||
|
||
def test_server_start_simple(): | ||
global server | ||
server.start() | ||
res = server.make_request("GET", "/health") | ||
assert res.status_code == 200 | ||
|
||
|
||
def test_server_props(): | ||
global server | ||
server.start() | ||
res = server.make_request("GET", "/props") | ||
assert res.status_code == 200 | ||
assert res.body["total_slots"] == server.n_slots | ||
|
||
|
||
def test_server_models(): | ||
global server | ||
server.start() | ||
res = server.make_request("GET", "/models") | ||
assert res.status_code == 200 | ||
assert len(res.body["data"]) == 1 | ||
assert res.body["data"][0]["id"] == server.model_alias |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,129 @@ | ||
import pytest | ||
from openai import OpenAI | ||
from utils import * | ||
|
||
server = ServerPreset.tinyllama2() | ||
|
||
|
||
@pytest.fixture(scope="module", autouse=True) | ||
def create_server(): | ||
global server | ||
server = ServerPreset.tinyllama2() | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"model,system_prompt,user_prompt,max_tokens,re_content,n_prompt,n_predicted,truncated", | ||
[ | ||
("llama-2", "Book", "What is the best book", 8, "(Suddenly)+", 77, 8, False), | ||
("codellama70b", "You are a coding assistant.", "Write the fibonacci function in c++.", 128, "(Aside|she|felter|alonger)+", 104, 64, False), | ||
] | ||
) | ||
def test_chat_completion(model, system_prompt, user_prompt, max_tokens, re_content, n_prompt, n_predicted, truncated): | ||
global server | ||
server.start() | ||
res = server.make_request("POST", "/chat/completions", data={ | ||
"model": model, | ||
"max_tokens": max_tokens, | ||
"messages": [ | ||
{"role": "system", "content": system_prompt}, | ||
{"role": "user", "content": user_prompt}, | ||
], | ||
}) | ||
assert res.status_code == 200 | ||
assert res.body["usage"]["prompt_tokens"] == n_prompt | ||
assert res.body["usage"]["completion_tokens"] == n_predicted | ||
choice = res.body["choices"][0] | ||
assert "assistant" == choice["message"]["role"] | ||
assert match_regex(re_content, choice["message"]["content"]) | ||
if truncated: | ||
assert choice["finish_reason"] == "length" | ||
else: | ||
assert choice["finish_reason"] == "stop" | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"model,system_prompt,user_prompt,max_tokens,re_content,n_prompt,n_predicted,truncated", | ||
[ | ||
("llama-2", "Book", "What is the best book", 8, "(Suddenly)+", 77, 8, False), | ||
("codellama70b", "You are a coding assistant.", "Write the fibonacci function in c++.", 128, "(Aside|she|felter|alonger)+", 104, 64, False), | ||
] | ||
) | ||
def test_chat_completion_stream(model, system_prompt, user_prompt, max_tokens, re_content, n_prompt, n_predicted, truncated): | ||
global server | ||
server.start() | ||
res = server.make_stream_request("POST", "/chat/completions", data={ | ||
"model": model, | ||
"max_tokens": max_tokens, | ||
"messages": [ | ||
{"role": "system", "content": system_prompt}, | ||
{"role": "user", "content": user_prompt}, | ||
], | ||
"stream": True, | ||
}) | ||
content = "" | ||
for data in res: | ||
choice = data["choices"][0] | ||
if choice["finish_reason"] in ["stop", "length"]: | ||
assert data["usage"]["prompt_tokens"] == n_prompt | ||
assert data["usage"]["completion_tokens"] == n_predicted | ||
assert "content" not in choice["delta"] | ||
assert match_regex(re_content, content) | ||
# FIXME: not sure why this is incorrect in stream mode | ||
# if truncated: | ||
# assert choice["finish_reason"] == "length" | ||
# else: | ||
# assert choice["finish_reason"] == "stop" | ||
else: | ||
assert choice["finish_reason"] is None | ||
content += choice["delta"]["content"] | ||
|
||
|
||
def test_chat_completion_with_openai_library(): | ||
global server | ||
server.start() | ||
client = OpenAI(api_key="dummy", base_url=f"http://{server.server_host}:{server.server_port}") | ||
res = client.chat.completions.create( | ||
model="gpt-3.5-turbo-instruct", | ||
messages=[ | ||
{"role": "system", "content": "Book"}, | ||
{"role": "user", "content": "What is the best book"}, | ||
], | ||
max_tokens=8, | ||
seed=42, | ||
temperature=0.8, | ||
) | ||
print(res) | ||
assert res.choices[0].finish_reason == "stop" | ||
assert res.choices[0].message.content is not None | ||
assert match_regex("(Suddenly)+", res.choices[0].message.content) | ||
|
||
|
||
@pytest.mark.parametrize("response_format,n_predicted,re_content", [ | ||
({"type": "json_object", "schema": {"const": "42"}}, 6, "\"42\""), | ||
({"type": "json_object", "schema": {"items": [{"type": "integer"}]}}, 10, "[ -3000 ]"), | ||
({"type": "json_object"}, 10, "(\\{|John)+"), | ||
({"type": "sound"}, 0, None), | ||
# invalid response format (expected to fail) | ||
({"type": "json_object", "schema": 123}, 0, None), | ||
({"type": "json_object", "schema": {"type": 123}}, 0, None), | ||
({"type": "json_object", "schema": {"type": "hiccup"}}, 0, None), | ||
]) | ||
def test_completion_with_response_format(response_format: dict, n_predicted: int, re_content: str | None): | ||
global server | ||
server.start() | ||
res = server.make_request("POST", "/chat/completions", data={ | ||
"max_tokens": n_predicted, | ||
"messages": [ | ||
{"role": "system", "content": "You are a coding assistant."}, | ||
{"role": "user", "content": "Write an example"}, | ||
], | ||
"response_format": response_format, | ||
}) | ||
if re_content is not None: | ||
assert res.status_code == 200 | ||
choice = res.body["choices"][0] | ||
assert match_regex(re_content, choice["message"]["content"]) | ||
else: | ||
assert res.status_code != 200 | ||
assert "error" in res.body | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
import pytest | ||
from openai import OpenAI | ||
from utils import * | ||
|
||
server = ServerPreset.tinyllama2() | ||
|
||
|
||
@pytest.fixture(scope="module", autouse=True) | ||
def create_server(): | ||
global server | ||
server = ServerPreset.tinyllama2() | ||
|
||
|
||
@pytest.mark.parametrize("prompt,n_predict,re_content,n_prompt,n_predicted,truncated", [ | ||
("I believe the meaning of life is", 8, "(going|bed)+", 18, 8, False), | ||
("Write a joke about AI from a very long prompt which will not be truncated", 256, "(princesses|everyone|kids|Anna|forest)+", 46, 64, False), | ||
]) | ||
def test_completion(prompt: str, n_predict: int, re_content: str, n_prompt: int, n_predicted: int, truncated: bool): | ||
global server | ||
server.start() | ||
res = server.make_request("POST", "/completion", data={ | ||
"n_predict": n_predict, | ||
"prompt": prompt, | ||
}) | ||
assert res.status_code == 200 | ||
assert res.body["timings"]["prompt_n"] == n_prompt | ||
assert res.body["timings"]["predicted_n"] == n_predicted | ||
assert res.body["truncated"] == truncated | ||
assert match_regex(re_content, res.body["content"]) | ||
|
||
|
||
@pytest.mark.parametrize("prompt,n_predict,re_content,n_prompt,n_predicted,truncated", [ | ||
("I believe the meaning of life is", 8, "(going|bed)+", 18, 8, False), | ||
("Write a joke about AI from a very long prompt which will not be truncated", 256, "(princesses|everyone|kids|Anna|forest)+", 46, 64, False), | ||
]) | ||
def test_completion_stream(prompt: str, n_predict: int, re_content: str, n_prompt: int, n_predicted: int, truncated: bool): | ||
global server | ||
server.start() | ||
res = server.make_stream_request("POST", "/completion", data={ | ||
"n_predict": n_predict, | ||
"prompt": prompt, | ||
"stream": True, | ||
}) | ||
content = "" | ||
for data in res: | ||
if data["stop"]: | ||
assert data["timings"]["prompt_n"] == n_prompt | ||
assert data["timings"]["predicted_n"] == n_predicted | ||
assert data["truncated"] == truncated | ||
assert match_regex(re_content, content) | ||
else: | ||
content += data["content"] | ||
|
||
|
||
# FIXME: This test is not working because /completions endpoint is not OAI-compatible | ||
@pytest.mark.skip(reason="Only /chat/completions is OAI-compatible for now") | ||
def test_completion_with_openai_library(): | ||
global server | ||
server.start() | ||
client = OpenAI(api_key="dummy", base_url=f"http://{server.server_host}:{server.server_port}") | ||
res = client.completions.create( | ||
model="gpt-3.5-turbo-instruct", | ||
prompt="I believe the meaning of life is", | ||
max_tokens=8, | ||
seed=42, | ||
temperature=0.8, | ||
) | ||
print(res) | ||
assert res.choices[0].finish_reason == "length" | ||
assert match_regex("(going|bed)+", res.choices[0].text) | ||
|
||
|
||
@pytest.mark.parametrize("n_slots", [1, 2]) | ||
def test_consistent_result_same_seed(n_slots: int): | ||
global server | ||
server.n_slots = n_slots | ||
server.start() | ||
last_res = None | ||
for _ in range(4): | ||
res = server.make_request("POST", "/completion", data={ | ||
"prompt": "I believe the meaning of life is", | ||
"seed": 42, | ||
"temperature": 1.0, | ||
}) | ||
if last_res is not None: | ||
assert res.body["content"] == last_res.body["content"] | ||
last_res = res | ||
|
||
|
||
@pytest.mark.parametrize("n_slots", [1, 2]) | ||
def test_different_result_different_seed(n_slots: int): | ||
global server | ||
server.n_slots = n_slots | ||
server.start() | ||
last_res = None | ||
for seed in range(4): | ||
res = server.make_request("POST", "/completion", data={ | ||
"prompt": "I believe the meaning of life is", | ||
"seed": seed, | ||
"temperature": 1.0, | ||
}) | ||
if last_res is not None: | ||
assert res.body["content"] != last_res.body["content"] | ||
last_res = res | ||
|
||
|
||
@pytest.mark.parametrize("n_batch", [16, 32]) | ||
@pytest.mark.parametrize("temperature", [0.0, 1.0]) | ||
def test_consistent_result_different_batch_size(n_batch: int, temperature: float): | ||
global server | ||
server.n_batch = n_batch | ||
server.start() | ||
last_res = None | ||
for _ in range(4): | ||
res = server.make_request("POST", "/completion", data={ | ||
"prompt": "I believe the meaning of life is", | ||
"seed": 42, | ||
"temperature": temperature, | ||
}) | ||
if last_res is not None: | ||
assert res.body["content"] == last_res.body["content"] | ||
last_res = res | ||
|
||
# TODO: add completion with tokens as input, mixed token+string input |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.