-
Notifications
You must be signed in to change notification settings - Fork 88
✨Deepsparse Backend implementation #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 10 commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
c3abc8d
WIP
a6d9a05
✅ Tests are fixed
d116c0c
📌 deepsparse is added to dependencies
c000dbf
✨ deepsparse backend integration is added
52e1d3b
deepsparse package limitations are applied
7218795
⚰️ removed `pytest.mark.asyncio()` due to pytest-asyncio module
a5357ca
📝 fixed class example
68381a5
🧵 rollback `pytest.mark.asyncio` fixtures
5acb3a8
✨ Deepsparse Backend integration first implementation
45e07d0
code quality is provided
1753469
Merge branch 'main' into parfeniukink/features/deepsparse-backend
1f1e038
fit Deepsparse Backend to work with new Backend abstraction
ce1c3ba
🔧 `GUIDELLM__LLM_MODEL` shared across all the backends
8e88bae
Test emulated data source constant -> settings value
75e708b
💄 mdformat is happy
3c03961
Merge branch 'main' into parfeniukink/features/deepsparse-backend
913253f
✅ Tests are fixed according to a new Backend base implementation
e376ed9
🔨 tox tests include `deepsparse` dependency
3a2c6c1
🏷️ Type annotations are added
74a6dfd
🐛 Assert with config values instead of constants
1a53951
📌 .[deepsparse] dependency is skipped if Python>3.11
39ffcb3
🚚 DeepsparseBackend is moved to a another module
29e38e4
✅ Deepsparse tests are ignored if Python>=3.12
4b3b4b5
💚 Linters are happy
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,11 @@ | ||
from .base import Backend, BackendEngine, GenerativeResponse | ||
from .deepsparse.backend import DeepsparseBackend | ||
from .openai import OpenAIBackend | ||
|
||
__all__ = [ | ||
"Backend", | ||
"BackendEngine", | ||
"GenerativeResponse", | ||
"OpenAIBackend", | ||
"DeepsparseBackend", | ||
] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
""" | ||
This package encapsulates the "Deepsparse Backend" implementation. | ||
ref: https://github.com/neuralmagic/deepsparse | ||
|
||
The `deepsparse` package supports Python3.6..Python3.11, | ||
when the `guidellm` start from Python3.8. | ||
|
||
Safe range of versions is Python3.8..Python3.11 | ||
for the Deepsparse Backend implementation. | ||
""" | ||
|
||
from guidellm.utils import check_python_version, module_is_available | ||
|
||
# Ensure that python is in valid range | ||
check_python_version(min_version="3.8", max_version="3.11") | ||
|
||
# Ensure that deepsparse is installed | ||
module_is_available( | ||
module="deepsparse", | ||
helper=( | ||
"`deepsparse` package is not available. " | ||
"Please try `pip install -e '.[deepsparse]'`" | ||
), | ||
) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
from typing import Any, AsyncGenerator, Dict, List, Optional | ||
|
||
from deepsparse import Pipeline, TextGeneration | ||
from loguru import logger | ||
|
||
from guidellm.backend import Backend, GenerativeResponse | ||
from guidellm.config import settings | ||
from guidellm.core import TextGenerationRequest | ||
|
||
|
||
@Backend.register(backend_type="deepsparse") | ||
class DeepsparseBackend(Backend): | ||
""" | ||
An Deepsparse backend implementation for the generative AI result. | ||
""" | ||
|
||
def __init__(self, model: Optional[str] = None, **request_args): | ||
self._request_args: Dict[str, Any] = request_args | ||
self.model: str = self._get_model(model) | ||
self.pipeline: Pipeline = TextGeneration(model=self.model) | ||
|
||
def _get_model(self, model_from_cli: Optional[str] = None) -> str: | ||
"""Provides the model by the next priority list: | ||
1. from function argument (comes from CLI) | ||
1. from environment variable | ||
2. `self.default_model` from `self.available_models` | ||
""" | ||
|
||
if model_from_cli is not None: | ||
return model_from_cli | ||
elif settings.deepsprase.model is not None: | ||
logger.info( | ||
"Using Deepsparse model from environment variable: " | ||
f"{settings.deepsprase.model}" | ||
) | ||
return settings.deepsprase.model | ||
else: | ||
logger.info(f"Using default Deepsparse model: {self.default_model}") | ||
return self.default_model | ||
|
||
async def make_request( | ||
self, request: TextGenerationRequest | ||
) -> AsyncGenerator[GenerativeResponse, None]: | ||
""" | ||
Make a request to the Deepsparse Python API client. | ||
|
||
:param request: The result request to submit. | ||
:type request: TextGenerationRequest | ||
:return: An iterator over the generative responses. | ||
:rtype: Iterator[GenerativeResponse] | ||
""" | ||
|
||
logger.debug( | ||
f"Making request to Deepsparse backend with prompt: {request.prompt}" | ||
) | ||
|
||
token_count = 0 | ||
request_args = { | ||
**self._request_args, | ||
"streaming": True, | ||
"max_new_tokens": request.output_token_count, | ||
} | ||
|
||
if not (output := self.pipeline(prompt=request.prompt, **request_args)): | ||
yield GenerativeResponse( | ||
type_="final", | ||
prompt=request.prompt, | ||
prompt_token_count=request.prompt_token_count, | ||
output_token_count=token_count, | ||
) | ||
return | ||
|
||
for generation in output.generations: | ||
if not (token := generation.text): | ||
yield GenerativeResponse( | ||
type_="final", | ||
prompt=request.prompt, | ||
prompt_token_count=request.prompt_token_count, | ||
output_token_count=token_count, | ||
) | ||
break | ||
else: | ||
token_count += 1 | ||
yield GenerativeResponse( | ||
type_="token_iter", | ||
add_token=token, | ||
prompt=request.prompt, | ||
prompt_token_count=request.prompt_token_count, | ||
output_token_count=token_count, | ||
) | ||
|
||
def available_models(self) -> List[str]: | ||
""" | ||
Get the available models for the backend. | ||
|
||
:return: A list of available models. | ||
:rtype: List[str] | ||
""" | ||
|
||
# WARNING: The default model from the documentation is defined here | ||
return ["hf:mgoin/TinyStories-33M-quant-deepsparse"] | ||
|
||
def _token_count(self, text: str) -> int: | ||
token_count = len(text.split()) | ||
logger.debug(f"Token count for text '{text}': {token_count}") | ||
return token_count |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.