|
| 1 | +# IO Processor Plugins |
| 2 | + |
| 3 | +IO Processor plugins are a feature that allows pre and post processing of the model input and output for pooling models. The idea is that users are allowed to pass a custom input to vLLM that is converted into one or more model prompts and fed to the model `encode` method. One potential use-case of such plugins is that of using vLLM for generating multi-modal data. Say users feed an image to vLLM and get an image in output. |
| 4 | + |
| 5 | +When performing an inference with IO Processor plugins, the prompt type is defined by the plugin and the same is valid for the final request output. vLLM does not perform any validation of input/output data, and it is up to the plugin to ensure the correct data is being fed to the model and returned to the user. As of now these plugins support only pooling models and can be triggerd via the `encode` method in `LLM` and `AsyncLLM`, or in online serving mode via the `/pooling` endpoint. |
| 6 | + |
| 7 | +## Writing an IO Processor Plugin |
| 8 | + |
| 9 | +IO Processor plugins implement the `IOProcessor` interface (<gh-file:vllm/plugins/io_processors/interface.py>): |
| 10 | + |
| 11 | +```python |
| 12 | +IOProcessorInput = TypeVar('IOProcessorInput') |
| 13 | +IOProcessorOutput = TypeVar('IOProcessorOutput') |
| 14 | + |
| 15 | +class IOProcessor(ABC, Generic[IOProcessorInput, IOProcessorOutput]): |
| 16 | + |
| 17 | + def __init__(self, vllm_config: VllmConfig): |
| 18 | + self.vllm_config = vllm_config |
| 19 | + |
| 20 | + @abstractmethod |
| 21 | + def pre_process( |
| 22 | + self, |
| 23 | + prompt: IOProcessorInput, |
| 24 | + request_id: Optional[str] = None, |
| 25 | + **kwargs, |
| 26 | + ) -> Union[PromptType, Sequence[PromptType]]: |
| 27 | + raise NotImplementedError |
| 28 | + |
| 29 | + async def pre_process_async( |
| 30 | + self, |
| 31 | + prompt: IOProcessorInput, |
| 32 | + request_id: Optional[str] = None, |
| 33 | + **kwargs, |
| 34 | + ) -> Union[PromptType, Sequence[PromptType]]: |
| 35 | + return self.pre_process(prompt, request_id, **kwargs) |
| 36 | + |
| 37 | + @abstractmethod |
| 38 | + def post_process(self, |
| 39 | + model_output: Sequence[PoolingRequestOutput], |
| 40 | + request_id: Optional[str] = None, |
| 41 | + **kwargs) -> IOProcessorOutput: |
| 42 | + raise NotImplementedError |
| 43 | + |
| 44 | + async def post_process_async( |
| 45 | + self, |
| 46 | + model_output: AsyncGenerator[tuple[int, PoolingRequestOutput]], |
| 47 | + request_id: Optional[str] = None, |
| 48 | + **kwargs, |
| 49 | + ) -> IOProcessorOutput: |
| 50 | + collected_output = [item async for i, item in model_output] |
| 51 | + return self.post_process(collected_output, request_id, **kwargs) |
| 52 | + |
| 53 | + @abstractmethod |
| 54 | + def parse_request(self, request: Any) -> IOProcessorInput: |
| 55 | + raise NotImplementedError |
| 56 | + |
| 57 | + @abstractmethod |
| 58 | + def output_to_response( |
| 59 | + self, plugin_output: IOProcessorOutput) -> IOProcessorResponse: |
| 60 | + raise NotImplementedError |
| 61 | +``` |
| 62 | + |
| 63 | +The `parse_request` method is used for validating the user prompt and converting it into the input expected by the `pre_process`/`pre_process_async` methods. |
| 64 | +The `pre_process*` methods take the validated plugin input to generate vLLM's model prompts for regular inference. |
| 65 | +The `post_process*` methods take `PoolingRequestOutput` objects as input and generate a custom plugin output. |
| 66 | + |
| 67 | +The `output_to_response` method is used only for online serving and converts the plugin output to the `IOProcessorResponse` type that is then returned by the API Server. The implementation of the `/io_processor_pooling` serving endpoint is [here](../../vllm/entrypoints/openai/serving_pooling_with_io_plugin.py). |
| 68 | + |
| 69 | +An example implementation of a plugin that enables generating geotiff images with the PrithviGeospatialMAE model is available [here](https://github.com/christian-pinto/prithvi_io_processor_plugin). Please, also refer to our [online](../../examples/online_serving/prithvi_geospatial_mae.py) and [offline](../../examples/offline_inference/prithvi_geospatial_mae_io_processor.py) inference examples. |
| 70 | + |
| 71 | +## Using an IO Processor plugin |
| 72 | + |
| 73 | +IO Processor plugins are loaded at engine startup and there are two methods for specifying the name of the plugin to be loaded: |
| 74 | + |
| 75 | +1. Via vLLM's `EngineArgs`: setting the `io_processor_plugin` argument in the `EngineArgs` used to initialize the `AsyncLLM`. The same can be achieved by passing the `io_processor_plugin` argument to `LLM` in offline mode, or by passing the `--io-processor-plugin` argument in serving mode. |
| 76 | +2. Via the model HF configuration: adding an `io_processor_plugin` field to the model config (config.json). |
| 77 | + |
| 78 | +The order also determines method priority. i.e., setting the plugin name via `EngineArgs` will override any plugin name specified in the model HF config (config.json). |
0 commit comments