diff --git a/docs-gb/SUMMARY.md b/docs-gb/SUMMARY.md index 5dfd5c1bb..40e545978 100644 --- a/docs-gb/SUMMARY.md +++ b/docs-gb/SUMMARY.md @@ -24,15 +24,19 @@ * [Alibi-Explain](runtimes/alibi-explain.md) * [HuggingFace](runtimes/huggingface.md) * [Custom](runtimes/custom.md) -* [Reference](reference/README.md) - * [MLServer Settings](reference/settings.md) - * [Model Settings](reference/model-settings.md) - * [MLServer CLI](reference/cli.md) - * [Python API](reference/python-api/README.md) - * [MLModel](reference/api/model.md) - * [Types](reference/api/types.md) - * [Codecs](reference/api/codecs.md) - * [Metrics](reference/api/metrics.md) + +* [API Reference](api/api-reference.md) + * [MLServer Settings](api/Settings.md) + * [Model Settings](api/ModelSettings.md) + * [Model Parameters](api/ModelParameters.md) + * [MLServer CLI](api/CLI.md) + + * [Python API](api/PythonAPI.md) + * [MLModel](api/MLModel.md) + * [Types](api/Types.md) + * [Codecs](api/Codecs.md) + * [Metrics](api/Metrics.md) + * [Examples](examples/README.md) * [Serving Scikit-Learn models](examples/sklearn/README.md) * [Serving XGBoost models](examples/xgboost/README.md) diff --git a/docs-gb/api/CLI.md b/docs-gb/api/CLI.md new file mode 100644 index 000000000..1f06ea6e7 --- /dev/null +++ b/docs-gb/api/CLI.md @@ -0,0 +1,143 @@ +# MLServer CLI + +The MLServer package includes a mlserver CLI designed to help with common tasks in a model’s lifecycle. You can see a high-level outline at any time via: + +```bash +mlserver --help +``` + +## root + +Command-line interface to manage MLServer models. + +```bash +root [OPTIONS] COMMAND [ARGS]... +``` + +### Options + +- `--version` (Default: `False`) + Show the version and exit. + +## build + +Build a Docker image for a custom MLServer runtime. + +```bash +root build [OPTIONS] FOLDER +``` + +### Options + +- `-t`, `--tag` `` + +- `--no-cache` (Default: `False`) + +### Arguments + +- `FOLDER` + Required argument + +## dockerfile + +Generate a Dockerfile + +```bash +root dockerfile [OPTIONS] FOLDER +``` + +### Options + +- `-i`, `--include-dockerignore` (Default: `False`) + +### Arguments + +- `FOLDER` + Required argument + +## infer + +Deprecated: This experimental feature will be removed in future work. + Execute batch inference requests against V2 inference server. + +> Deprecated: This experimental feature will be removed in future work. + +```bash +root infer [OPTIONS] +``` + +### Options + +- `--url`, `-u` `` (Default: `localhost:8080`; Env: `MLSERVER_INFER_URL`) + URL of the MLServer to send inference requests to. Should not contain http or https. + +- `--model-name`, `-m` `` (Required; Env: `MLSERVER_INFER_MODEL_NAME`) + Name of the model to send inference requests to. + +- `--input-data-path`, `-i` `` (Required; Env: `MLSERVER_INFER_INPUT_DATA_PATH`) + Local path to the input file containing inference requests to be processed. + +- `--output-data-path`, `-o` `` (Required; Env: `MLSERVER_INFER_OUTPUT_DATA_PATH`) + Local path to the output file for the inference responses to be written to. + +- `--workers`, `-w` `` (Default: `10`; Env: `MLSERVER_INFER_WORKERS`) + +- `--retries`, `-r` `` (Default: `3`; Env: `MLSERVER_INFER_RETRIES`) + +- `--batch-size`, `-s` `` (Default: `1`; Env: `MLSERVER_INFER_BATCH_SIZE`) + Send inference requests grouped together as micro-batches. + +- `--binary-data`, `-b` (Default: `False`; Env: `MLSERVER_INFER_BINARY_DATA`) + Send inference requests as binary data (not fully supported). + +- `--verbose`, `-v` (Default: `False`; Env: `MLSERVER_INFER_VERBOSE`) + Verbose mode. + +- `--extra-verbose`, `-vv` (Default: `False`; Env: `MLSERVER_INFER_EXTRA_VERBOSE`) + Extra verbose mode (shows detailed requests and responses). + +- `--transport`, `-t` `` (Options: `rest` | `grpc`; Default: `rest`; Env: `MLSERVER_INFER_TRANSPORT`) + Transport type to use to send inference requests. Can be 'rest' or 'grpc' (not yet supported). + +- `--request-headers`, `-H` `` (Env: `MLSERVER_INFER_REQUEST_HEADERS`) + Headers to be set on each inference request send to the server. Multiple options are allowed as: -H 'Header1: Val1' -H 'Header2: Val2'. When setting up as environmental provide as 'Header1:Val1 Header2:Val2'. + +- `--timeout` `` (Default: `60`; Env: `MLSERVER_INFER_CONNECTION_TIMEOUT`) + Connection timeout to be passed to tritonclient. + +- `--batch-interval` `` (Default: `0`; Env: `MLSERVER_INFER_BATCH_INTERVAL`) + Minimum time interval (in seconds) between requests made by each worker. + +- `--batch-jitter` `` (Default: `0`; Env: `MLSERVER_INFER_BATCH_JITTER`) + Maximum random jitter (in seconds) added to batch interval between requests. + +- `--use-ssl` (Default: `False`; Env: `MLSERVER_INFER_USE_SSL`) + Use SSL in communications with inference server. + +- `--insecure` (Default: `False`; Env: `MLSERVER_INFER_INSECURE`) + Disable SSL verification in communications. Use with caution. + +## init + +Generate a base project template + +```bash +root init [OPTIONS] +``` + +### Options + +- `-t`, `--template` `` (Default: `https://github.com/EthicalML/sml-security/`) + +## start + +Start serving a machine learning model with MLServer. + +```bash +root start [OPTIONS] FOLDER +``` + +### Arguments + +- `FOLDER` + Required argument diff --git a/docs-gb/api/Codecs.md b/docs-gb/api/Codecs.md new file mode 100644 index 000000000..bd59c1daf --- /dev/null +++ b/docs-gb/api/Codecs.md @@ -0,0 +1,542 @@ +# Codecs + +## Base64Codec + +Codec that convers to / from a base64 input. + +### Methods + +### can_encode() + +```python +can_encode(payload: Any) -> bool +``` + +Evaluate whether the codec can encode (decode) the payload. + +### decode_input() + +```python +decode_input(request_input: RequestInput) -> List[bytes] +``` + +Decode a request input into a high-level Python type. + +### decode_output() + +```python +decode_output(response_output: ResponseOutput) -> List[bytes] +``` + +Decode a response output into a high-level Python type. + +### encode_input() + +```python +encode_input(name: str, payload: List[bytes], use_bytes: bool = True, kwargs) -> RequestInput +``` + +Encode the given payload into a ``RequestInput``. + +### encode_output() + +```python +encode_output(name: str, payload: List[bytes], use_bytes: bool = True, kwargs) -> ResponseOutput +``` + +Encode the given payload into a response output. + +## CodecError + +### Methods + +### add_note() + +```python +add_note(...) +``` + +Exception.add_note(note) -- +add a note to the exception + +### with_traceback() + +```python +with_traceback(...) +``` + +Exception.with_traceback(tb) -- +set self.__traceback__ to tb and return self. + +## DatetimeCodec + +Codec that convers to / from a datetime input. + +### Methods + +### can_encode() + +```python +can_encode(payload: Any) -> bool +``` + +Evaluate whether the codec can encode (decode) the payload. + +### decode_input() + +```python +decode_input(request_input: RequestInput) -> List[datetime] +``` + +Decode a request input into a high-level Python type. + +### decode_output() + +```python +decode_output(response_output: ResponseOutput) -> List[datetime] +``` + +Decode a response output into a high-level Python type. + +### encode_input() + +```python +encode_input(name: str, payload: List[Union[str, datetime]], use_bytes: bool = True, kwargs) -> RequestInput +``` + +Encode the given payload into a ``RequestInput``. + +### encode_output() + +```python +encode_output(name: str, payload: List[Union[str, datetime]], use_bytes: bool = True, kwargs) -> ResponseOutput +``` + +Encode the given payload into a response output. + +## InputCodec + +The InputCodec interface lets you define type conversions of your raw input +data to / from the Open Inference Protocol. +Note that this codec applies at the individual input (output) level. + +For request-wide transformations (e.g. dataframes), use the +``RequestCodec`` interface instead. + +### Methods + +### can_encode() + +```python +can_encode(payload: Any) -> bool +``` + +Evaluate whether the codec can encode (decode) the payload. + +### decode_input() + +```python +decode_input(request_input: RequestInput) -> Any +``` + +Decode a request input into a high-level Python type. + +### decode_output() + +```python +decode_output(response_output: ResponseOutput) -> Any +``` + +Decode a response output into a high-level Python type. + +### encode_input() + +```python +encode_input(name: str, payload: Any, kwargs) -> RequestInput +``` + +Encode the given payload into a ``RequestInput``. + +### encode_output() + +```python +encode_output(name: str, payload: Any, kwargs) -> ResponseOutput +``` + +Encode the given payload into a response output. + +## NumpyCodec + +Decodes an request input (response output) as a NumPy array. + +### Methods + +### can_encode() + +```python +can_encode(payload: Any) -> bool +``` + +Evaluate whether the codec can encode (decode) the payload. + +### decode_input() + +```python +decode_input(request_input: RequestInput) -> ndarray +``` + +Decode a request input into a high-level Python type. + +### decode_output() + +```python +decode_output(response_output: ResponseOutput) -> ndarray +``` + +Decode a response output into a high-level Python type. + +### encode_input() + +```python +encode_input(name: str, payload: ndarray, kwargs) -> RequestInput +``` + +Encode the given payload into a ``RequestInput``. + +### encode_output() + +```python +encode_output(name: str, payload: ndarray, kwargs) -> ResponseOutput +``` + +Encode the given payload into a response output. + +## NumpyRequestCodec + +Decodes the first input (output) of request (response) as a NumPy array. +This codec can be useful for cases where the whole payload is a single +NumPy tensor. + +### Methods + +### can_encode() + +```python +can_encode(payload: Any) -> bool +``` + +Evaluate whether the codec can encode (decode) the payload. + +### decode_request() + +```python +decode_request(request: InferenceRequest) -> Any +``` + +Decode an inference request into a high-level Python object. + +### decode_response() + +```python +decode_response(response: InferenceResponse) -> Any +``` + +Decode an inference response into a high-level Python object. + +### encode_request() + +```python +encode_request(payload: Any, kwargs) -> InferenceRequest +``` + +Encode the given payload into an inference request. + +### encode_response() + +```python +encode_response(model_name: str, payload: Any, model_version: Optional[str] = None, kwargs) -> InferenceResponse +``` + +Encode the given payload into an inference response. + +## PandasCodec + +Decodes a request (response) into a Pandas DataFrame, assuming each input +(output) head corresponds to a column of the DataFrame. + +### Methods + +### can_encode() + +```python +can_encode(payload: Any) -> bool +``` + +Evaluate whether the codec can encode (decode) the payload. + +### decode_request() + +```python +decode_request(request: InferenceRequest) -> DataFrame +``` + +Decode an inference request into a high-level Python object. + +### decode_response() + +```python +decode_response(response: InferenceResponse) -> DataFrame +``` + +Decode an inference response into a high-level Python object. + +### encode_outputs() + +```python +encode_outputs(payload: DataFrame, use_bytes: bool = True) -> List[ResponseOutput] +``` + +### encode_request() + +```python +encode_request(payload: DataFrame, use_bytes: bool = True, kwargs) -> InferenceRequest +``` + +Encode the given payload into an inference request. + +### encode_response() + +```python +encode_response(model_name: str, payload: DataFrame, model_version: Optional[str] = None, use_bytes: bool = True, kwargs) -> InferenceResponse +``` + +Encode the given payload into an inference response. + +## RequestCodec + +The ``RequestCodec`` interface lets you define request-level conversions +between high-level Python types and the Open Inference Protocol. +This can be useful where the encoding of your payload encompases multiple +input heads (e.g. dataframes, where each column can be thought as a +separate input head). + +For individual input-level encoding / decoding, use the ``InputCodec`` +interface instead. + +### Methods + +### can_encode() + +```python +can_encode(payload: Any) -> bool +``` + +Evaluate whether the codec can encode (decode) the payload. + +### decode_request() + +```python +decode_request(request: InferenceRequest) -> Any +``` + +Decode an inference request into a high-level Python object. + +### decode_response() + +```python +decode_response(response: InferenceResponse) -> Any +``` + +Decode an inference response into a high-level Python object. + +### encode_request() + +```python +encode_request(payload: Any, kwargs) -> InferenceRequest +``` + +Encode the given payload into an inference request. + +### encode_response() + +```python +encode_response(model_name: str, payload: Any, model_version: Optional[str] = None, kwargs) -> InferenceResponse +``` + +Encode the given payload into an inference response. + +## StringCodec + +Encodes a list of Python strings as a BYTES input (output). + +### Methods + +### can_encode() + +```python +can_encode(payload: Any) -> bool +``` + +Evaluate whether the codec can encode (decode) the payload. + +### decode_input() + +```python +decode_input(request_input: RequestInput) -> List[str] +``` + +Decode a request input into a high-level Python type. + +### decode_output() + +```python +decode_output(response_output: ResponseOutput) -> List[str] +``` + +Decode a response output into a high-level Python type. + +### encode_input() + +```python +encode_input(name: str, payload: List[str], use_bytes: bool = True, kwargs) -> RequestInput +``` + +Encode the given payload into a ``RequestInput``. + +### encode_output() + +```python +encode_output(name: str, payload: List[str], use_bytes: bool = True, kwargs) -> ResponseOutput +``` + +Encode the given payload into a response output. + +## StringRequestCodec + +Decodes the first input (output) of request (response) as a list of +strings. +This codec can be useful for cases where the whole payload is a single +list of strings. + +### Methods + +### can_encode() + +```python +can_encode(payload: Any) -> bool +``` + +Evaluate whether the codec can encode (decode) the payload. + +### decode_request() + +```python +decode_request(request: InferenceRequest) -> Any +``` + +Decode an inference request into a high-level Python object. + +### decode_response() + +```python +decode_response(response: InferenceResponse) -> Any +``` + +Decode an inference response into a high-level Python object. + +### encode_request() + +```python +encode_request(payload: Any, kwargs) -> InferenceRequest +``` + +Encode the given payload into an inference request. + +### encode_response() + +```python +encode_response(model_name: str, payload: Any, model_version: Optional[str] = None, kwargs) -> InferenceResponse +``` + +Encode the given payload into an inference response. + +## decode_args() + +```python +decode_args(predict: Callable) -> Callable[[ForwardRef('MLModel'), ], Coroutine[Any, Any, InferenceResponse]] +``` + +_No description available._ + +## decode_inference_request() + +```python +decode_inference_request(inference_request: InferenceRequest, model_settings: Optional[ModelSettings] = None, metadata_inputs: Dict[str, MetadataTensor] = {}) -> Optional[Any] +``` + +_No description available._ + +## decode_request_input() + +```python +decode_request_input(request_input: RequestInput, metadata_inputs: Dict[str, MetadataTensor] = {}) -> Optional[Any] +``` + +_No description available._ + +## encode_inference_response() + +```python +encode_inference_response(payload: Any, model_settings: ModelSettings) -> Optional[InferenceResponse] +``` + +_No description available._ + +## encode_response_output() + +```python +encode_response_output(payload: Any, request_output: RequestOutput, metadata_outputs: Dict[str, MetadataTensor] = {}) -> Optional[ResponseOutput] +``` + +_No description available._ + +## get_decoded() + +```python +get_decoded(parametrised_obj: Union[InferenceRequest, RequestInput, RequestOutput, ResponseOutput, InferenceResponse]) -> Any +``` + +_No description available._ + +## get_decoded_or_raw() + +```python +get_decoded_or_raw(parametrised_obj: Union[InferenceRequest, RequestInput, RequestOutput, ResponseOutput, InferenceResponse]) -> Any +``` + +_No description available._ + +## has_decoded() + +```python +has_decoded(parametrised_obj: Union[InferenceRequest, RequestInput, RequestOutput, ResponseOutput, InferenceResponse]) -> bool +``` + +_No description available._ + +## register_input_codec() + +```python +register_input_codec(CodecKlass: Union[type[InputCodec], InputCodec]) +``` + +_No description available._ + +## register_request_codec() + +```python +register_request_codec(CodecKlass: Union[type[RequestCodec], RequestCodec]) +``` + +_No description available._ + diff --git a/docs-gb/api/MLModel.md b/docs-gb/api/MLModel.md new file mode 100644 index 000000000..969b6efaa --- /dev/null +++ b/docs-gb/api/MLModel.md @@ -0,0 +1,127 @@ +# MLModel + +Abstract inference runtime which exposes the main interface to interact +with ML models. + +## Methods + +### decode() + +```python +decode(request_input: RequestInput, default_codec: Union[type[ForwardRef('InputCodec')], ForwardRef('InputCodec'), None] = None) -> Any +``` + +Helper to decode a **request input** into its corresponding high-level +Python object. +This method will find the most appropiate :doc:`input codec +` based on the model's metadata and the +input's content type. +Otherwise, it will fall back to the codec specified in the +``default_codec`` kwarg. + +### decode_request() + +```python +decode_request(inference_request: InferenceRequest, default_codec: Union[type[ForwardRef('RequestCodec')], ForwardRef('RequestCodec'), None] = None) -> Any +``` + +Helper to decode an **inference request** into its corresponding +high-level Python object. +This method will find the most appropiate :doc:`request codec +` based on the model's metadata and the +requests's content type. +Otherwise, it will fall back to the codec specified in the +``default_codec`` kwarg. + +### encode() + +```python +encode(payload: Any, request_output: RequestOutput, default_codec: Union[type[ForwardRef('InputCodec')], ForwardRef('InputCodec'), None] = None) -> ResponseOutput +``` + +Helper to encode a high-level Python object into its corresponding +**response output**. +This method will find the most appropiate :doc:`input codec +` based on the model's metadata, request +output's content type or payload's type. +Otherwise, it will fall back to the codec specified in the +``default_codec`` kwarg. + +### encode_response() + +```python +encode_response(payload: Any, default_codec: Union[type[ForwardRef('RequestCodec')], ForwardRef('RequestCodec'), None] = None) -> InferenceResponse +``` + +Helper to encode a high-level Python object into its corresponding +**inference response**. +This method will find the most appropiate :doc:`request codec +` based on the payload's type. +Otherwise, it will fall back to the codec specified in the +``default_codec`` kwarg. + +### load() + +```python +load() -> bool +``` + +Method responsible for loading the model from a model artefact. +This method will be called on each of the parallel workers (when +:doc:`parallel inference `) is +enabled). +Its return value will represent the model's readiness status. +A return value of ``True`` will mean the model is ready. + +**This method can be overriden to implement your custom load +logic.** + +### metadata() + +```python +metadata() -> MetadataModelResponse +``` + +_No description available._ + +### predict() + +```python +predict(payload: InferenceRequest) -> InferenceResponse +``` + +Method responsible for running inference on the model. + + +**This method can be overriden to implement your custom inference +logic.** + +### predict_stream() + +```python +predict_stream(payloads: AsyncIterator[InferenceRequest]) -> AsyncIterator[InferenceResponse] +``` + +Method responsible for running generation on the model, streaming a set +of responses back to the client. + + +**This method can be overriden to implement your custom inference +logic.** + +### unload() + +```python +unload() -> bool +``` + +Method responsible for unloading the model, freeing any resources (e.g. +CPU memory, GPU memory, etc.). +This method will be called on each of the parallel workers (when +:doc:`parallel inference `) is +enabled). +A return value of ``True`` will mean the model is now unloaded. + +**This method can be overriden to implement your custom unload +logic.** + diff --git a/docs-gb/api/Metrics.md b/docs-gb/api/Metrics.md new file mode 100644 index 000000000..4f86b5a2e --- /dev/null +++ b/docs-gb/api/Metrics.md @@ -0,0 +1,53 @@ +# Metrics + +## MetricsServer + +### Methods + +### on_worker_stop() + +```python +on_worker_stop(worker: Worker) -> None +``` + +### start() + +```python +start() +``` + +### stop() + +```python +stop(sig: Optional[int] = None) +``` + +## configure_metrics() + +```python +configure_metrics(settings: Settings) +``` + +_No description available._ + +## log() + +```python +log(metrics) +``` + +Logs a new set of metric values. +Each kwarg of this method will be treated as a separate metric / value +pair. +If any of the metrics does not exist, a new one will be created with a +default description. + +## register() + +```python +register(name: str, description: str) -> Histogram +``` + +Registers a new metric with its description. +If the metric already exists, it will just return the existing one. + diff --git a/docs-gb/api/ModelParameters.md b/docs-gb/api/ModelParameters.md new file mode 100644 index 000000000..ede0c1fee --- /dev/null +++ b/docs-gb/api/ModelParameters.md @@ -0,0 +1,24 @@ +# ModelParameters + +### Config + +| Attribute | Type | Default | +|-----------|------|---------| +| `extra` | `str` | `"allow"` | +| `env_prefix` | `str` | `"MLSERVER_MODEL_"` | +| `env_file` | `str` | `".env"` | +| `protected_namespaces` | `tuple` | `('model_', 'settings_')` | + +### Fields + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `autogenerate_inference_pool_gid` | `bool` | `False` | Flag to autogenerate the inference pool group id for this model. | +| `content_type` | `Optional[str]` | `None` | Default content type to use for requests and responses. | +| `environment_path` | `Optional[str]` | `None` | Path to a directory that contains the python environment to be used to load this model. | +| `environment_tarball` | `Optional[str]` | `None` | Path to the environment tarball which should be used to load this model. | +| `extra` | `Optional[dict]` | `` | Arbitrary settings, dependent on the inference runtime implementation. | +| `format` | `Optional[str]` | `None` | Format of the model (only available on certain runtimes). | +| `inference_pool_gid` | `Optional[str]` | `None` | Inference pool group id to be used to serve this model. | +| `uri` | `Optional[str]` | `None` | URI where the model artifacts can be found. This path must be either absolute or relative to where MLServer is running. | +| `version` | `Optional[str]` | `None` | Version of the model. | diff --git a/docs-gb/api/ModelSettings.md b/docs-gb/api/ModelSettings.md new file mode 100644 index 000000000..f9fa46b02 --- /dev/null +++ b/docs-gb/api/ModelSettings.md @@ -0,0 +1,27 @@ +# ModelSettings + +### Config + +| Attribute | Type | Default | +|-----------|------|---------| +| `extra` | `str` | `"ignore"` | +| `env_prefix` | `str` | `"MLSERVER_MODEL_"` | +| `env_file` | `str` | `".env"` | +| `protected_namespaces` | `tuple` | `('model_', 'settings_')` | + +### Fields + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `cache_enabled` | `bool` | `False` | Enable caching for a specific model. This parameter can be used to disable cache for a specific model, if the server-level caching is enabled. If the server-level caching is disabled, this parameter value will have no effect. | +| `implementation_` | `str` | `-` | *Python path* to the inference runtime to use to serve this model (e.g. `mlserver_sklearn.SKLearnModel`). | +| `inputs` | `List[MetadataTensor]` | `` | Metadata about the inputs accepted by the model. | +| `max_batch_size` | `int` | `0` | When adaptive batching is enabled, maximum number of requests to group together in a single batch. | +| `max_batch_time` | `float` | `0.0` | When adaptive batching is enabled, maximum amount of time (in seconds) to wait for enough requests to build a full batch. | +| `name` | `str` | `''` | Name of the model. | +| `outputs` | `List[MetadataTensor]` | `` | Metadata about the outputs returned by the model. | +| `parallel_workers` | `Optional[int]` | `None` | Use the `parallel_workers` field in the server-wide settings instead. | +| `parameters` | `Optional[ModelParameters]` | `None` | Extra parameters for each instance of this model. | +| `platform` | `str` | `''` | Framework used to train and serialise the model (e.g. sklearn). | +| `versions` | `List[str]` | `` | Versions of dependencies used to train the model (e.g. sklearn/0.20.1). | +| `warm_workers` | `bool` | `False` | Inference workers will now always be `warmed up` at start time. | diff --git a/docs-gb/api/PythonAPI.md b/docs-gb/api/PythonAPI.md new file mode 100644 index 000000000..45984fdc4 --- /dev/null +++ b/docs-gb/api/PythonAPI.md @@ -0,0 +1,30 @@ +# Python API + +MLServer exposes a Python framework to build custom inference runtimes, define request/response types, plug codecs for payload conversion, and emit metrics. This page provides a high-level overview and links to the API docs. + +- [MLModel](./MLModel.md) + - Base class to implement custom inference runtimes. + - Core lifecycle: `load()`, `predict()`, `unload()`. + - Helpers for encoding/decoding requests and responses. + - Access to model metadata and settings. + - Extend this class to implement your own model logic. +- [Types](./Types.md) + - Data structures and enums for the V2 inference protocol. + - Includes Pydantic models like `InferenceRequest`, `InferenceResponse`, `RequestInput`, `ResponseOutput`. + - See model fields (type and default) and JSON Schemas in the docs. +- [Codecs](./Codecs.md) + - Encode/decode payloads between Open Inference Protocol types and Python types. + - Base classes: `InputCodec` (inputs/outputs) and `RequestCodec` (requests/responses). + - Built-ins include codecs such as `NumpyCodec`, `Base64Codec`, `StringCodec`, etc. +- [Metrics](./Metrics.md) + - Emit and configure metrics within MLServer. + - Use `log()` to record custom metrics; see server lifecycle hooks and utilities. + +{% hint style="tip" %} +When creating a custom runtime, start by subclassing `MLModel`, use the structures from [Types](./Types.md) for requests/responses, pick or implement the appropriate [Codecs](./Codecs.md), and optionally emit [Metrics](./Metrics.md) from your model code. +{% endhint %} + + + + + diff --git a/docs-gb/api/Settings.md b/docs-gb/api/Settings.md new file mode 100644 index 000000000..8a4d5d6d0 --- /dev/null +++ b/docs-gb/api/Settings.md @@ -0,0 +1,46 @@ +# Settings + +### Config + +| Attribute | Type | Default | +|-----------|------|---------| +| `extra` | `str` | `"ignore"` | +| `env_prefix` | `str` | `"MLSERVER_"` | +| `env_file` | `str` | `".env"` | +| `protected_namespaces` | `tuple` | `()` | + +### Fields + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `cache_enabled` | `bool` | `False` | Enable caching for the model predictions. | +| `cache_size` | `int` | `100` | Cache size to be used if caching is enabled. | +| `cors_settings` | `Optional[CORSSettings]` | `None` | - | +| `debug` | `bool` | `True` | - | +| `environments_dir` | `str` | `'-'` | - | +| `extensions` | `List[str]` | `[]` | - | +| `grpc_max_message_length` | `Optional[int]` | `None` | - | +| `grpc_port` | `int` | `8081` | - | +| `gzip_enabled` | `bool` | `True` | Enable GZipMiddleware. | +| `host` | `str` | `'0.0.0.0'` | - | +| `http_port` | `int` | `8080` | - | +| `kafka_enabled` | `bool` | `False` | Enable Kafka integration for the server. | +| `kafka_servers` | `str` | `'localhost:9092'` | Comma-separated list of Kafka servers. | +| `kafka_topic_input` | `str` | `'mlserver-input'` | Kafka topic for input messages. | +| `kafka_topic_output` | `str` | `'mlserver-output'` | Kafka topic for output messages. | +| `load_models_at_startup` | `bool` | `True` | - | +| `logging_settings` | `Union[str, Dict[Any, Any], None]` | `None` | Path to logging config file or dictionary configuration. | +| `metrics_dir` | `str` | `'-'` | Directory used to share metrics across parallel workers. Equivalent to the `PROMETHEUS_MULTIPROC_DIR` env var in `prometheus-client`. Note that this won't be used if the `parallel_workers` flag is disabled. By default, the `.metrics` folder of the current working directory will be used. | +| `metrics_endpoint` | `Optional[str]` | `'/metrics'` | Endpoint used to expose Prometheus metrics. Alternatively, can be set to `None` to disable it. | +| `metrics_port` | `int` | `8082` | Port used to expose metrics endpoint. | +| `metrics_rest_server_prefix` | `str` | `'rest_server'` | Metrics rest server string prefix to be exported. | +| `model_repository_implementation` | `Optional[ImportString]` | `None` | - | +| `model_repository_implementation_args` | `dict` | `{}` | - | +| `model_repository_root` | `str` | `'.'` | - | +| `parallel_workers` | `int` | `1` | - | +| `parallel_workers_timeout` | `int` | `5` | - | +| `root_path` | `str` | `''` | - | +| `server_name` | `str` | `'mlserver'` | - | +| `server_version` | `str` | `'1.7.0.dev0'` | - | +| `tracing_server` | `Optional[str]` | `None` | Server name used to export OpenTelemetry tracing to collector service. | +| `use_structured_logging` | `bool` | `False` | Use JSON-formatted structured logging instead of default format. | diff --git a/docs-gb/api/Types.md b/docs-gb/api/Types.md new file mode 100644 index 000000000..83fd55250 --- /dev/null +++ b/docs-gb/api/Types.md @@ -0,0 +1,1443 @@ +# Types + +## Datatype + +An enumeration. + +## InferenceErrorResponse + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `error` | `Optional[str]` | `None` | - | +
JSON Schema + + +```json + +{ + "properties": { + "error": { + "anyOf": [ + { + "type": "string" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Error" + } + }, + "title": "InferenceErrorResponse", + "type": "object" +} + +``` + + +
+ +## InferenceRequest + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `id` | `Optional[str]` | `None` | - | +| `inputs` | `List[RequestInput]` | `-` | - | +| `outputs` | `Optional[List[RequestOutput]]` | `None` | - | +| `parameters` | `Optional[Parameters]` | `None` | - | +
JSON Schema + + +```json + +{ + "$defs": { + "Datatype": { + "enum": [ + "BOOL", + "UINT8", + "UINT16", + "UINT32", + "UINT64", + "INT8", + "INT16", + "INT32", + "INT64", + "FP16", + "FP32", + "FP64", + "BYTES" + ], + "title": "Datatype", + "type": "string" + }, + "Parameters": { + "additionalProperties": true, + "properties": { + "content_type": { + "anyOf": [ + { + "type": "string" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Content Type" + }, + "headers": { + "anyOf": [ + { + "type": "object" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Headers" + } + }, + "title": "Parameters", + "type": "object" + }, + "RequestInput": { + "properties": { + "name": { + "title": "Name", + "type": "string" + }, + "shape": { + "items": { + "type": "integer" + }, + "title": "Shape", + "type": "array" + }, + "datatype": { + "$ref": "#/$defs/Datatype" + }, + "parameters": { + "anyOf": [ + { + "$ref": "#/$defs/Parameters" + }, + { + "type": "null" + } + ], + "default": null + }, + "data": { + "$ref": "#/$defs/TensorData" + } + }, + "required": [ + "name", + "shape", + "datatype", + "data" + ], + "title": "RequestInput", + "type": "object" + }, + "RequestOutput": { + "properties": { + "name": { + "title": "Name", + "type": "string" + }, + "parameters": { + "anyOf": [ + { + "$ref": "#/$defs/Parameters" + }, + { + "type": "null" + } + ], + "default": null + } + }, + "required": [ + "name" + ], + "title": "RequestOutput", + "type": "object" + }, + "TensorData": { + "anyOf": [ + { + "items": {}, + "type": "array" + }, + {} + ], + "title": "TensorData" + } + }, + "properties": { + "id": { + "anyOf": [ + { + "type": "string" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Id" + }, + "parameters": { + "anyOf": [ + { + "$ref": "#/$defs/Parameters" + }, + { + "type": "null" + } + ], + "default": null + }, + "inputs": { + "items": { + "$ref": "#/$defs/RequestInput" + }, + "title": "Inputs", + "type": "array" + }, + "outputs": { + "anyOf": [ + { + "items": { + "$ref": "#/$defs/RequestOutput" + }, + "type": "array" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Outputs" + } + }, + "required": [ + "inputs" + ], + "title": "InferenceRequest", + "type": "object" +} + +``` + + +
+ +## InferenceResponse + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `id` | `Optional[str]` | `None` | - | +| `model_name` | `str` | `-` | - | +| `model_version` | `Optional[str]` | `None` | - | +| `outputs` | `List[ResponseOutput]` | `-` | - | +| `parameters` | `Optional[Parameters]` | `None` | - | +
JSON Schema + + +```json + +{ + "$defs": { + "Datatype": { + "enum": [ + "BOOL", + "UINT8", + "UINT16", + "UINT32", + "UINT64", + "INT8", + "INT16", + "INT32", + "INT64", + "FP16", + "FP32", + "FP64", + "BYTES" + ], + "title": "Datatype", + "type": "string" + }, + "Parameters": { + "additionalProperties": true, + "properties": { + "content_type": { + "anyOf": [ + { + "type": "string" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Content Type" + }, + "headers": { + "anyOf": [ + { + "type": "object" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Headers" + } + }, + "title": "Parameters", + "type": "object" + }, + "ResponseOutput": { + "properties": { + "name": { + "title": "Name", + "type": "string" + }, + "shape": { + "items": { + "type": "integer" + }, + "title": "Shape", + "type": "array" + }, + "datatype": { + "$ref": "#/$defs/Datatype" + }, + "parameters": { + "anyOf": [ + { + "$ref": "#/$defs/Parameters" + }, + { + "type": "null" + } + ], + "default": null + }, + "data": { + "$ref": "#/$defs/TensorData" + } + }, + "required": [ + "name", + "shape", + "datatype", + "data" + ], + "title": "ResponseOutput", + "type": "object" + }, + "TensorData": { + "anyOf": [ + { + "items": {}, + "type": "array" + }, + {} + ], + "title": "TensorData" + } + }, + "properties": { + "model_name": { + "title": "Model Name", + "type": "string" + }, + "model_version": { + "anyOf": [ + { + "type": "string" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Model Version" + }, + "id": { + "anyOf": [ + { + "type": "string" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Id" + }, + "parameters": { + "anyOf": [ + { + "$ref": "#/$defs/Parameters" + }, + { + "type": "null" + } + ], + "default": null + }, + "outputs": { + "items": { + "$ref": "#/$defs/ResponseOutput" + }, + "title": "Outputs", + "type": "array" + } + }, + "required": [ + "model_name", + "outputs" + ], + "title": "InferenceResponse", + "type": "object" +} + +``` + + +
+ +## MetadataModelErrorResponse + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `error` | `str` | `-` | - | +
JSON Schema + + +```json + +{ + "properties": { + "error": { + "title": "Error", + "type": "string" + } + }, + "required": [ + "error" + ], + "title": "MetadataModelErrorResponse", + "type": "object" +} + +``` + + +
+ +## MetadataModelResponse + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `inputs` | `Optional[List[MetadataTensor]]` | `None` | - | +| `name` | `str` | `-` | - | +| `outputs` | `Optional[List[MetadataTensor]]` | `None` | - | +| `parameters` | `Optional[Parameters]` | `None` | - | +| `platform` | `str` | `-` | - | +| `versions` | `Optional[List[str]]` | `None` | - | +
JSON Schema + + +```json + +{ + "$defs": { + "Datatype": { + "enum": [ + "BOOL", + "UINT8", + "UINT16", + "UINT32", + "UINT64", + "INT8", + "INT16", + "INT32", + "INT64", + "FP16", + "FP32", + "FP64", + "BYTES" + ], + "title": "Datatype", + "type": "string" + }, + "MetadataTensor": { + "properties": { + "name": { + "title": "Name", + "type": "string" + }, + "datatype": { + "$ref": "#/$defs/Datatype" + }, + "shape": { + "items": { + "type": "integer" + }, + "title": "Shape", + "type": "array" + }, + "parameters": { + "anyOf": [ + { + "$ref": "#/$defs/Parameters" + }, + { + "type": "null" + } + ], + "default": null + } + }, + "required": [ + "name", + "datatype", + "shape" + ], + "title": "MetadataTensor", + "type": "object" + }, + "Parameters": { + "additionalProperties": true, + "properties": { + "content_type": { + "anyOf": [ + { + "type": "string" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Content Type" + }, + "headers": { + "anyOf": [ + { + "type": "object" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Headers" + } + }, + "title": "Parameters", + "type": "object" + } + }, + "properties": { + "name": { + "title": "Name", + "type": "string" + }, + "versions": { + "anyOf": [ + { + "items": { + "type": "string" + }, + "type": "array" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Versions" + }, + "platform": { + "title": "Platform", + "type": "string" + }, + "inputs": { + "anyOf": [ + { + "items": { + "$ref": "#/$defs/MetadataTensor" + }, + "type": "array" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Inputs" + }, + "outputs": { + "anyOf": [ + { + "items": { + "$ref": "#/$defs/MetadataTensor" + }, + "type": "array" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Outputs" + }, + "parameters": { + "anyOf": [ + { + "$ref": "#/$defs/Parameters" + }, + { + "type": "null" + } + ], + "default": null + } + }, + "required": [ + "name", + "platform" + ], + "title": "MetadataModelResponse", + "type": "object" +} + +``` + + +
+ +## MetadataServerErrorResponse + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `error` | `str` | `-` | - | +
JSON Schema + + +```json + +{ + "properties": { + "error": { + "title": "Error", + "type": "string" + } + }, + "required": [ + "error" + ], + "title": "MetadataServerErrorResponse", + "type": "object" +} + +``` + + +
+ +## MetadataServerResponse + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `extensions` | `List[str]` | `-` | - | +| `name` | `str` | `-` | - | +| `version` | `str` | `-` | - | +
JSON Schema + + +```json + +{ + "properties": { + "name": { + "title": "Name", + "type": "string" + }, + "version": { + "title": "Version", + "type": "string" + }, + "extensions": { + "items": { + "type": "string" + }, + "title": "Extensions", + "type": "array" + } + }, + "required": [ + "name", + "version", + "extensions" + ], + "title": "MetadataServerResponse", + "type": "object" +} + +``` + + +
+ +## MetadataTensor + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `datatype` | `Datatype` | `-` | - | +| `name` | `str` | `-` | - | +| `parameters` | `Optional[Parameters]` | `None` | - | +| `shape` | `List[int]` | `-` | - | +
JSON Schema + + +```json + +{ + "$defs": { + "Datatype": { + "enum": [ + "BOOL", + "UINT8", + "UINT16", + "UINT32", + "UINT64", + "INT8", + "INT16", + "INT32", + "INT64", + "FP16", + "FP32", + "FP64", + "BYTES" + ], + "title": "Datatype", + "type": "string" + }, + "Parameters": { + "additionalProperties": true, + "properties": { + "content_type": { + "anyOf": [ + { + "type": "string" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Content Type" + }, + "headers": { + "anyOf": [ + { + "type": "object" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Headers" + } + }, + "title": "Parameters", + "type": "object" + } + }, + "properties": { + "name": { + "title": "Name", + "type": "string" + }, + "datatype": { + "$ref": "#/$defs/Datatype" + }, + "shape": { + "items": { + "type": "integer" + }, + "title": "Shape", + "type": "array" + }, + "parameters": { + "anyOf": [ + { + "$ref": "#/$defs/Parameters" + }, + { + "type": "null" + } + ], + "default": null + } + }, + "required": [ + "name", + "datatype", + "shape" + ], + "title": "MetadataTensor", + "type": "object" +} + +``` + + +
+ +## Parameters + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `content_type` | `Optional[str]` | `None` | - | +| `headers` | `Optional[Dict[str, Any]]` | `None` | - | +
JSON Schema + + +```json + +{ + "additionalProperties": true, + "properties": { + "content_type": { + "anyOf": [ + { + "type": "string" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Content Type" + }, + "headers": { + "anyOf": [ + { + "type": "object" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Headers" + } + }, + "title": "Parameters", + "type": "object" +} + +``` + + +
+ +## RepositoryIndexRequest + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `ready` | `Optional[bool]` | `None` | - | +
JSON Schema + + +```json + +{ + "properties": { + "ready": { + "anyOf": [ + { + "type": "boolean" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Ready" + } + }, + "title": "RepositoryIndexRequest", + "type": "object" +} + +``` + + +
+ +## RepositoryIndexResponse + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `root` | `List[RepositoryIndexResponseItem]` | `-` | - | +
JSON Schema + + +```json + +{ + "$defs": { + "RepositoryIndexResponseItem": { + "properties": { + "name": { + "title": "Name", + "type": "string" + }, + "version": { + "anyOf": [ + { + "type": "string" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Version" + }, + "state": { + "$ref": "#/$defs/State" + }, + "reason": { + "title": "Reason", + "type": "string" + } + }, + "required": [ + "name", + "state", + "reason" + ], + "title": "RepositoryIndexResponseItem", + "type": "object" + }, + "State": { + "enum": [ + "UNKNOWN", + "READY", + "UNAVAILABLE", + "LOADING", + "UNLOADING" + ], + "title": "State", + "type": "string" + } + }, + "items": { + "$ref": "#/$defs/RepositoryIndexResponseItem" + }, + "title": "RepositoryIndexResponse", + "type": "array" +} + +``` + + +
+ +## RepositoryIndexResponseItem + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `name` | `str` | `-` | - | +| `reason` | `str` | `-` | - | +| `state` | `State` | `-` | - | +| `version` | `Optional[str]` | `None` | - | +
JSON Schema + + +```json + +{ + "$defs": { + "State": { + "enum": [ + "UNKNOWN", + "READY", + "UNAVAILABLE", + "LOADING", + "UNLOADING" + ], + "title": "State", + "type": "string" + } + }, + "properties": { + "name": { + "title": "Name", + "type": "string" + }, + "version": { + "anyOf": [ + { + "type": "string" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Version" + }, + "state": { + "$ref": "#/$defs/State" + }, + "reason": { + "title": "Reason", + "type": "string" + } + }, + "required": [ + "name", + "state", + "reason" + ], + "title": "RepositoryIndexResponseItem", + "type": "object" +} + +``` + + +
+ +## RepositoryLoadErrorResponse + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `error` | `Optional[str]` | `None` | - | +
JSON Schema + + +```json + +{ + "properties": { + "error": { + "anyOf": [ + { + "type": "string" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Error" + } + }, + "title": "RepositoryLoadErrorResponse", + "type": "object" +} + +``` + + +
+ +## RepositoryUnloadErrorResponse + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `error` | `Optional[str]` | `None` | - | +
JSON Schema + + +```json + +{ + "properties": { + "error": { + "anyOf": [ + { + "type": "string" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Error" + } + }, + "title": "RepositoryUnloadErrorResponse", + "type": "object" +} + +``` + + +
+ +## RequestInput + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `data` | `TensorData` | `-` | - | +| `datatype` | `Datatype` | `-` | - | +| `name` | `str` | `-` | - | +| `parameters` | `Optional[Parameters]` | `None` | - | +| `shape` | `List[int]` | `-` | - | +
JSON Schema + + +```json + +{ + "$defs": { + "Datatype": { + "enum": [ + "BOOL", + "UINT8", + "UINT16", + "UINT32", + "UINT64", + "INT8", + "INT16", + "INT32", + "INT64", + "FP16", + "FP32", + "FP64", + "BYTES" + ], + "title": "Datatype", + "type": "string" + }, + "Parameters": { + "additionalProperties": true, + "properties": { + "content_type": { + "anyOf": [ + { + "type": "string" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Content Type" + }, + "headers": { + "anyOf": [ + { + "type": "object" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Headers" + } + }, + "title": "Parameters", + "type": "object" + }, + "TensorData": { + "anyOf": [ + { + "items": {}, + "type": "array" + }, + {} + ], + "title": "TensorData" + } + }, + "properties": { + "name": { + "title": "Name", + "type": "string" + }, + "shape": { + "items": { + "type": "integer" + }, + "title": "Shape", + "type": "array" + }, + "datatype": { + "$ref": "#/$defs/Datatype" + }, + "parameters": { + "anyOf": [ + { + "$ref": "#/$defs/Parameters" + }, + { + "type": "null" + } + ], + "default": null + }, + "data": { + "$ref": "#/$defs/TensorData" + } + }, + "required": [ + "name", + "shape", + "datatype", + "data" + ], + "title": "RequestInput", + "type": "object" +} + +``` + + +
+ +## RequestOutput + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `name` | `str` | `-` | - | +| `parameters` | `Optional[Parameters]` | `None` | - | +
JSON Schema + + +```json + +{ + "$defs": { + "Parameters": { + "additionalProperties": true, + "properties": { + "content_type": { + "anyOf": [ + { + "type": "string" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Content Type" + }, + "headers": { + "anyOf": [ + { + "type": "object" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Headers" + } + }, + "title": "Parameters", + "type": "object" + } + }, + "properties": { + "name": { + "title": "Name", + "type": "string" + }, + "parameters": { + "anyOf": [ + { + "$ref": "#/$defs/Parameters" + }, + { + "type": "null" + } + ], + "default": null + } + }, + "required": [ + "name" + ], + "title": "RequestOutput", + "type": "object" +} + +``` + + +
+ +## ResponseOutput + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `data` | `TensorData` | `-` | - | +| `datatype` | `Datatype` | `-` | - | +| `name` | `str` | `-` | - | +| `parameters` | `Optional[Parameters]` | `None` | - | +| `shape` | `List[int]` | `-` | - | +
JSON Schema + + +```json + +{ + "$defs": { + "Datatype": { + "enum": [ + "BOOL", + "UINT8", + "UINT16", + "UINT32", + "UINT64", + "INT8", + "INT16", + "INT32", + "INT64", + "FP16", + "FP32", + "FP64", + "BYTES" + ], + "title": "Datatype", + "type": "string" + }, + "Parameters": { + "additionalProperties": true, + "properties": { + "content_type": { + "anyOf": [ + { + "type": "string" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Content Type" + }, + "headers": { + "anyOf": [ + { + "type": "object" + }, + { + "type": "null" + } + ], + "default": null, + "title": "Headers" + } + }, + "title": "Parameters", + "type": "object" + }, + "TensorData": { + "anyOf": [ + { + "items": {}, + "type": "array" + }, + {} + ], + "title": "TensorData" + } + }, + "properties": { + "name": { + "title": "Name", + "type": "string" + }, + "shape": { + "items": { + "type": "integer" + }, + "title": "Shape", + "type": "array" + }, + "datatype": { + "$ref": "#/$defs/Datatype" + }, + "parameters": { + "anyOf": [ + { + "$ref": "#/$defs/Parameters" + }, + { + "type": "null" + } + ], + "default": null + }, + "data": { + "$ref": "#/$defs/TensorData" + } + }, + "required": [ + "name", + "shape", + "datatype", + "data" + ], + "title": "ResponseOutput", + "type": "object" +} + +``` + + +
+ +## State + +An enumeration. + +## TensorData + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `root` | `Union[List[Any], Any]` | `-` | - | +
JSON Schema + + +```json + +{ + "anyOf": [ + { + "items": {}, + "type": "array" + }, + {} + ], + "title": "TensorData" +} + +``` + + +
+ diff --git a/docs-gb/api/api-reference.md b/docs-gb/api/api-reference.md new file mode 100644 index 000000000..a24874cb5 --- /dev/null +++ b/docs-gb/api/api-reference.md @@ -0,0 +1,45 @@ +# API Reference Overview + +This page links to the key reference docs for configuring and using MLServer. + +## MLServer Settings + +Server-wide configuration (e.g., HTTP/GRPC ports) loaded from a `settings.json` in the working directory. Settings can also be provided via environment variables prefixed with `MLSERVER_` (e.g., `MLSERVER_GRPC_PORT`). + +- Scope: server-wide (independent from model-specific settings) +- Sources: `settings.json` or env vars `MLSERVER_*` + +[Read the full reference →](./Settings.md) + +## Model Settings + +Each model has its own configuration (metadata, parallelism, etc.). Typically provided via a `model-settings.json` next to the model artifacts. Alternatively, use env vars prefixed with `MLSERVER_MODEL_` (e.g., `MLSERVER_MODEL_IMPLEMENTATION`). If no `model-settings.json` is found, MLServer will try to load a default model from these env vars. Note: these env vars are shared across models unless overridden by `model-settings.json`. + +- Scope: per-model +- Sources: `model-settings.json` or env vars `MLSERVER_MODEL_*` + +[Read the full reference →](./ModelSettings.md) + +## MLServer CLI + +The `mlserver` CLI helps with common model lifecycle tasks (build images, init projects, start serving, etc.). For a quick overview: + +```bash +mlserver --help +``` + +- Commands include: `build`, `dockerfile`, `infer` (deprecated), `init`, `start` +- Each command lists its options, arguments, and examples + +[Read the full CLI reference →](./CLI.md) + +## Python API + +Build custom runtimes and integrate with MLServer using Python: + +- MLModel: base class for custom inference runtimes +- Types: request/response schemas and enums (Pydantic) +- Codecs: payload conversions between protocol types and Python types +- Metrics: emit and configure metrics + +[Browse the Python API →](./PythonAPI.md)