Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 33 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
Scale Efficiently: Evaluate and Optimize Your LLM Deployments for Real-World Inference Needs
</h3>

[![GitHub Release](https://img.shields.io/github/release/neuralmagic/guidellm.svg?label=Version)](https://github.com/neuralmagic/guidellm/releases) [![Documentation](https://img.shields.io/badge/Documentation-8A2BE2?logo=read-the-docs&logoColor=%23ffffff&color=%231BC070)](https://github.com/neuralmagic/guidellm/tree/main/docs) [![License](https://img.shields.io/github/license/neuralmagic/guidellm.svg)](https://github.com/neuralmagic/guidellm/blob/main/LICENSE) [![PyPi Release](https://img.shields.io/pypi/v/guidellm.svg?label=PyPi%20Release)](https://pypi.python.org/pypi/guidellm) [![Pypi Release](https://img.shields.io/pypi/v/guidellm-nightly.svg?label=PyPi%20Nightly)](https://pypi.python.org/pypi/guidellm-nightly) [![Python Versions](https://img.shields.io/pypi/pyversions/guidellm.svg?label=Python)](https://pypi.python.org/pypi/guidellm) [![Nightly Build](https://img.shields.io/github/actions/workflow/status/neuralmagic/guidellm/nightly.yml?branch=main&label=Nightly%20Build)](https://github.com/neuralmagic/guidellm/actions/workflows/nightly.yml)
[![GitHub Release](https://img.shields.io/github/release/neuralmagic/guidellm.svg?label=Version)](https://github.com/neuralmagic/guidellm/releases) [![Documentation](https://img.shields.io/badge/Documentation-8A2BE2?logo=read-the-docs&logoColor=%23ffffff&color=%231BC070)](https://github.com/neuralmagic/guidellm/tree/main/docs) [![License](https://img.shields.io/github/license/neuralmagic/guidellm.svg)](https://github.com/neuralmagic/guidellm/blob/main/LICENSE) [![PyPI Release](https://img.shields.io/pypi/v/guidellm.svg?label=PyPI%20Release)](https://pypi.python.org/pypi/guidellm) [![Pypi Release](https://img.shields.io/pypi/v/guidellm-nightly.svg?label=PyPI%20Nightly)](https://pypi.python.org/pypi/guidellm-nightly) [![Python Versions](https://img.shields.io/pypi/pyversions/guidellm.svg?label=Python)](https://pypi.python.org/pypi/guidellm) [![Nightly Build](https://img.shields.io/github/actions/workflow/status/neuralmagic/guidellm/nightly.yml?branch=main&label=Nightly%20Build)](https://github.com/neuralmagic/guidellm/actions/workflows/nightly.yml)

## Overview

Expand Down Expand Up @@ -65,10 +65,12 @@ To run a GuideLLM evaluation, use the `guidellm` command with the appropriate mo
```bash
guidellm \
--target "http://localhost:8000/v1" \
--model "neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16"
--model "neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16" \
--data-type emulated \
--data "prompt_tokens=512,generated_tokens=128"
```

The above command will begin the evaluation and output progress updates similar to the following: <img src="https://github.com/neuralmagic/guidellm/blob/main/docs/assets/sample-benchmark.gif" />
The above command will begin the evaluation and output progress updates similar to the following (if running on a different server, be sure to update the target!): <img src="https://github.com/neuralmagic/guidellm/blob/main/docs/assets/sample-benchmark.gif" />

Notes:

Expand All @@ -88,17 +90,39 @@ The end of the output will include important performance summary metrics such as

<img alt="Sample GuideLLM benchmark end output" src="https://github.com/neuralmagic/guidellm/blob/main/docs/assets/sample-output-end.png" />

### Advanced Settings
### Configurations

GuideLLM provides various options to customize evaluations, including setting the duration of each benchmark run, the number of concurrent requests, and the request rate. For a complete list of options and advanced settings, see the [GuideLLM CLI Documentation](https://github.com/neuralmagic/guidellm/blob/main/docs/guides/cli.md).
GuideLLM provides various CLI and environment options to customize evaluations, including setting the duration of each benchmark run, the number of concurrent requests, and the request rate.

Some common advanced settings include:
Some common configurations for the CLI include:

- `--rate-type`: The rate to use for benchmarking. Options include `sweep` (shown above), `synchronous` (one request at a time), `throughput` (all requests at once), `constant` (a constant rate defined by `--rate`), and `poisson` (a poisson distribution rate defined by `--rate`).
- `--data-type`: The data to use for the benchmark. Options include `emulated` (default shown above, emulated to match a given prompt and output length), `transformers` (a transformers dataset), and `file` (a {text, json, jsonl, csv} file with a list of prompts).
- `--rate-type`: The rate to use for benchmarking. Options include `sweep`, `synchronous`, `throughput`, `constant`, and `poisson`.
- `--rate-type sweep`: (default) Sweep runs through the full range of performance for the server. Starting with a `synchronous` rate first, then `throughput`, and finally 10 `constant` rates between the min and max request rate found.
- `--rate-type synchronous`: Synchronous runs requests in a synchronous manner, one after the other.
- `--rate-type throughput`: Throughput runs requests in a throughput manner, sending requests as fast as possible.
- `--rate-type constant`: Constant runs requests at a constant rate. Specify the rate in requests per second with the `--rate` argument. For example, `--rate 10` or multiple rates with `--rate 10 --rate 20 --rate 30`.
- `--rate-type poisson`: Poisson draws from a poisson distribution with the mean at the specified rate, adding some real-world variance to the runs. Specify the rate in requests per second with the `--rate` argument. For example, `--rate 10` or multiple rates with `--rate 10 --rate 20 --rate 30`.
- `--data-type`: The data to use for the benchmark. Options include `emulated`, `transformers`, and `file`.
- `--data-type emulated`: Emulated supports an EmulationConfig in string or file format for the `--data` argument to generate fake data. Specify the number of prompt tokens at a minimum and optionally the number of output tokens and other params for variance in the length. For example, `--data "prompt_tokens=128"`, `--data "prompt_tokens=128,generated_tokens=128"`, or `--data "prompt_tokens=128,prompt_tokens_variance=10"`.
- `--data-type file`: File supports a file path or URL to a file for the `--data` argument. The file should contain data encoded as a CSV, JSONL, TXT, or JSON/YAML file with a single prompt per line for CSV, JSONL, and TXT or a list of prompts for JSON/YAML. For example, `--data "data.txt"` where data.txt contents are `"prompt1\nprompt2\nprompt3"`.
- `--data-type transformers`: Transformers supports a dataset name or dataset file path for the `--data` argument. For example, `--data "neuralmagic/LLM_compression_calibration"`.
- `--max-seconds`: The maximum number of seconds to run each benchmark. The default is 120 seconds.
- `--max-requests`: The maximum number of requests to run in each benchmark.

For a full list of supported CLI arguments, run the following command:

```bash
guidellm --help
```

For a full list of configuration options, run the following command:

```bash
guidellm-config
```

For further information, see the [GuideLLM Documentation](#Documentation).

## Resources

### Documentation
Expand All @@ -109,7 +133,7 @@ Our comprehensive documentation provides detailed guides and resources to help y

- [**Installation Guide**](https://github.com/neuralmagic/guidellm/tree/main/docs/install.md) - Step-by-step instructions to install GuideLLM, including prerequisites and setup tips.
- [**Architecture Overview**](https://github.com/neuralmagic/guidellm/tree/main/docs/architecture.md) - A detailed look at GuideLLM's design, components, and how they interact.
- [**CLI Guide**](https://github.com/neuralmagic/guidellm/tree/main/docs/guides/cli_usage.md) - Comprehensive usage information for running GuideLLM via the command line, including available commands and options.
- [**CLI Guide**](https://github.com/neuralmagic/guidellm/tree/main/docs/guides/cli.md) - Comprehensive usage information for running GuideLLM via the command line, including available commands and options.
- [**Configuration Guide**](https://github.com/neuralmagic/guidellm/tree/main/docs/guides/configuration.md) - Instructions on configuring GuideLLM to suit various deployment needs and performance goals.

### Supporting External Documentation
Expand Down
Binary file removed docs/assets/sample-benchmark.gif
Binary file not shown.
Binary file added docs/assets/sample-benchmarks.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ dev = [

[project.entry-points.console_scripts]
guidellm = "guidellm.main:generate_benchmark_report_cli"
guidellm-config = "guidellm.config:print_config"


# ************************************************
Expand Down
58 changes: 58 additions & 0 deletions src/guidellm/backend/base.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@
import asyncio
import functools
from abc import ABC, abstractmethod
from typing import AsyncGenerator, Dict, List, Literal, Optional, Type, Union

from loguru import logger
from pydantic import BaseModel
from transformers import ( # type: ignore # noqa: PGH003
AutoTokenizer,
PreTrainedTokenizer,
)

from guidellm.core import TextGenerationRequest, TextGenerationResult

Expand Down Expand Up @@ -103,10 +108,21 @@ def create(cls, backend_type: BackendEngine, **kwargs) -> "Backend":
return Backend._registry[backend_type](**kwargs)

def __init__(self, type_: BackendEngine, target: str, model: str):
"""
Base constructor for the Backend class.
Calls into test_connection to ensure the backend is reachable.
Ensure all setup is done in the subclass constructor before calling super.

:param type_: The type of the backend.
:param target: The target URL for the backend.
:param model: The model used by the backend.
"""
self._type = type_
self._target = target
self._model = model

self.test_connection()

@property
def default_model(self) -> str:
"""
Expand Down Expand Up @@ -148,6 +164,48 @@ def model(self) -> str:
"""
return self._model

def model_tokenizer(self) -> PreTrainedTokenizer:
"""
Get the tokenizer for the backend model.

:return: The tokenizer instance.
"""
return AutoTokenizer.from_pretrained(self.model)

def test_connection(self) -> bool:
"""
Test the connection to the backend by running a short text generation request.
If successful, returns True, otherwise raises an exception.

:return: True if the connection is successful.
:rtype: bool
:raises ValueError: If the connection test fails.
"""
try:
asyncio.get_running_loop()
is_async = True
except RuntimeError:
is_async = False

if is_async:
logger.warning("Running in async mode, cannot test connection")
return True

try:
request = TextGenerationRequest(
prompt="Test connection", output_token_count=5
)

asyncio.run(self.submit(request))
return True
except Exception as err:
raise_err = RuntimeError(
f"Backend connection test failed for backend type={self.type_} "
f"with target={self.target} and model={self.model} with error: {err}"
)
logger.error(raise_err)
raise raise_err from err

async def submit(self, request: TextGenerationRequest) -> TextGenerationResult:
"""
Submit a text generation request and return the result.
Expand Down
63 changes: 61 additions & 2 deletions src/guidellm/config.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import json
from enum import Enum
from typing import Dict, List, Optional
from typing import Dict, List, Optional, Sequence

from pydantic import BaseModel, Field, model_validator
from pydantic_settings import BaseSettings, SettingsConfigDict
Expand All @@ -10,6 +11,7 @@
"Environment",
"LoggingSettings",
"OpenAISettings",
"print_config",
"ReportGenerationSettings",
"Settings",
"reload_settings",
Expand Down Expand Up @@ -70,7 +72,6 @@ class DatasetSettings(BaseModel):
preferred_data_splits: List[str] = Field(
default_factory=lambda: ["test", "tst", "validation", "val", "train"]
)
default_tokenizer: str = "neuralmagic/Meta-Llama-3.1-8B-FP8"


class EmulatedDataSettings(BaseModel):
Expand Down Expand Up @@ -163,6 +164,53 @@ def set_default_source(cls, values):

return values

def generate_env_file(self) -> str:
"""
Generate the .env file from the current settings
"""
return Settings._recursive_generate_env(
self,
self.model_config["env_prefix"], # type: ignore # noqa: PGH003
self.model_config["env_nested_delimiter"], # type: ignore # noqa: PGH003
)

@staticmethod
def _recursive_generate_env(model: BaseModel, prefix: str, delimiter: str) -> str:
env_file = ""
add_models = []
for key, value in model.model_dump().items():
if isinstance(value, BaseModel):
# add nested properties to be processed after the current level
add_models.append((key, value))
continue

dict_values = (
{
f"{prefix}{key.upper()}{delimiter}{sub_key.upper()}": sub_value
for sub_key, sub_value in value.items()
}
if isinstance(value, dict)
else {f"{prefix}{key.upper()}": value}
)

for tag, sub_value in dict_values.items():
if isinstance(sub_value, Sequence) and not isinstance(sub_value, str):
value_str = ",".join(f'"{item}"' for item in sub_value)
env_file += f"{tag}=[{value_str}]\n"
elif isinstance(sub_value, Dict):
value_str = json.dumps(sub_value)
env_file += f"{tag}={value_str}\n"
elif not sub_value:
env_file += f"{tag}=\n"
else:
env_file += f'{tag}="{sub_value}"\n'

for key, value in add_models:
env_file += Settings._recursive_generate_env(
value, f"{prefix}{key.upper()}{delimiter}", delimiter
)
return env_file


settings = Settings()

Expand All @@ -173,3 +221,14 @@ def reload_settings():
"""
new_settings = Settings()
settings.__dict__.update(new_settings.__dict__)


def print_config():
"""
Print the current configuration settings
"""
print(f"Settings: \n{settings.generate_env_file()}") # noqa: T201


if __name__ == "__main__":
print_config()
14 changes: 14 additions & 0 deletions src/guidellm/core/request.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,17 @@ class TextGenerationRequest(Serializable):
default_factory=dict,
description="The parameters for the text generation request.",
)

def __str__(self) -> str:
prompt_short = (
self.prompt[:32] + "..."
if self.prompt and len(self.prompt) > 32 # noqa: PLR2004
else self.prompt
)

return (
f"TextGenerationRequest(id={self.id}, "
f"prompt={prompt_short}, prompt_token_count={self.prompt_token_count}, "
f"output_token_count={self.output_token_count}, "
f"params={self.params})"
)
Loading