Skip to content

Commit da08e97

Browse files
MMelQinCopilot
andauthored
Add README to the remote inference example app (#553)
* Add readme Signed-off-by: M Q <[email protected]> * Update examples/apps/ai_remote_infer_app/README.md Co-authored-by: Copilot <[email protected]> Signed-off-by: Ming M Qin <[email protected]> * Update examples/apps/ai_remote_infer_app/README.md Co-authored-by: Copilot <[email protected]> Signed-off-by: Ming M Qin <[email protected]> * Update examples/apps/ai_remote_infer_app/README.md Co-authored-by: Copilot <[email protected]> Signed-off-by: Ming M Qin <[email protected]> --------- Signed-off-by: M Q <[email protected]> Signed-off-by: Ming M Qin <[email protected]> Co-authored-by: Copilot <[email protected]>
1 parent 137ac32 commit da08e97

File tree

1 file changed

+138
-0
lines changed

1 file changed

+138
-0
lines changed
Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
# AI Remote Inference App - Spleen Segmentation
2+
3+
This example application demonstrates how to perform medical image segmentation using **Triton Inference Server** for remote inference calls. The app processes DICOM CT series to segment spleen anatomy using a deep learning model hosted on a remote Triton server.
4+
5+
## Overview
6+
7+
This application showcases:
8+
- **Remote inference using Triton Inference Server**: The app connects to a Triton server to perform model inference remotely rather than loading models locally. It communicates by sending and receiving input/output tensors corresponding to the model dimensions, including channels.
9+
- **Triton client integration**: The built-in `TritonRemoteModel` class is provided in the [triton_model.py](https://github.com/Project-MONAI/monai-deploy-app-sdk/blob/137ac32d647843579f52060c8f72f9d9e8b51c38/monai/deploy/core/models/triton_model.py) module. This class acts as a Triton inference client, communicating with an already loaded model network on the server. It supports the same API as the in-process model class (e.g., a loaded TorchScript model network). As a result, the application inference operator does not need to change when switching between in-process and remote inference.
10+
- **Model metadata parsing**: Uses Triton's model folder structure, which contains the `config.pbtxt` configuration file, to extract model specifications including name, input/output dimensions, and other metadata.
11+
- **Model path requirement**: The parent folder of the Triton model folder needs to be used as the model path for the application.
12+
13+
## Architecture
14+
15+
The application follows a pipeline architecture:
16+
17+
1. **DICOM Data Loading**: Loads DICOM study from input directory
18+
2. **Series Selection**: Selects appropriate CT series based on configurable rules
19+
3. **Volume Conversion**: Converts DICOM series to 3D volume
20+
4. **Remote Inference**: Performs spleen segmentation via Triton Inference Server
21+
5. **Output Generation**: Creates DICOM segmentation and STL mesh outputs
22+
23+
## Key Components
24+
25+
### Triton Integration
26+
27+
The `SpleenSegOperator` leverages the `MonaiSegInferenceOperator` which:
28+
- Uses the loaded model network which in turn acts as a **Triton inference client** and connects to a remote Triton Inference Server that actually serves the named model
29+
- Handles preprocessing and postprocessing transforms
30+
- No explicit remote inference logic is required in these two operators
31+
32+
### Model Configuration Requirements
33+
34+
The application requires a Triton model folder which contains a **Triton model configuration file** (`config.pbtxt`) to be present on the application side, and the parent path to the model folder will be used as the model path for the application. This example application has the following model folder structure:
35+
36+
```
37+
models_client_side/spleen_ct/config.pbtxt
38+
```
39+
40+
The path to `models_client_side` is the model path for the application while `spleen_ct` is the folder of the named model with the folder name matching the model name. The model name in the `config.pbtxt` file is therefore intentionally omitted.
41+
42+
This configuration file (`config.pbtxt`) contains essential model metadata:
43+
- **Model name**: `spleen_ct` (used for server communication)
44+
- **Input dimensions**: `[1, 96, 96, 96]` (channels, width, height, depth)
45+
- **Output dimensions**: `[2, 96, 96, 96]` (2-class segmentation output)
46+
- **Data types**: `TYPE_FP32` for both input and output
47+
- **Batching configuration**: Dynamic batching with preferred sizes
48+
- **Hardware requirements**: GPU-based inference
49+
50+
**Important**: The `config.pbtxt` file is used **in lieu of the actual model file** (e.g., TorchScript `.ts` file) that would be present in an in-process inference scenario. For remote inference, the physical model file (`model_spleen_ct_segmentation_v1.ts`) resides on the Triton server, while the client only needs the configuration metadata to understand the model's interface.
51+
52+
### API Compatibility Between In-Process and Remote Inference
53+
54+
The `TritonRemoteModel` class in the `triton_model.py` module contains the actual Triton client instance and provides the **same API as in-process model instances**. This design ensures that:
55+
56+
- **Application inference operators remain unchanged** whether using in-process or remote inference
57+
- **Seamless switching** between local and remote models without code modifications
58+
- **Unified interface** through the `__call__` method that handles both PyTorch tensors locally and Triton HTTP requests remotely
59+
- **Transparent model loading** where `MonaiSegInferenceOperator` uses the same `predictor` interface regardless of model location
60+
61+
## Setup and Configuration
62+
63+
### Environment Variables
64+
65+
Configure the following environment variables (see `env_settings_example.sh`):
66+
67+
```bash
68+
export HOLOSCAN_INPUT_PATH="inputs/spleen_ct_tcia" # Input DICOM directory
69+
export HOLOSCAN_MODEL_PATH="examples/apps/ai_remote_infer_app/models_client_side" # Client-side model config path
70+
export HOLOSCAN_OUTPUT_PATH="output_spleen" # Output directory
71+
export HOLOSCAN_LOG_LEVEL=DEBUG # Logging level
72+
export TRITON_SERVER_NETLOC="localhost:8000" # Triton server address
73+
```
74+
75+
### Triton Server Setup
76+
77+
1. **Server Side**: Deploy the actual model file (`model_spleen_ct_segmentation_v1.ts`) to your Triton server
78+
2. **Client Side**: Ensure the `config.pbtxt` file is available locally for metadata parsing
79+
3. **Network**: Ensure connectivity between client and Triton server on the specified port
80+
81+
### Directory Structure
82+
83+
```
84+
ai_remote_infer_app/
85+
├── app.py # Main application logic
86+
├── spleen_seg_operator.py # Custom segmentation operator
87+
├── __main__.py # Application entry point
88+
├── env_settings_example.sh # Environment configuration
89+
├── models_client_side/ # Client-side model configurations
90+
│ └── spleen_ct/
91+
│ └── config.pbtxt # Triton model configuration (no model file)
92+
└── README.md # This file
93+
```
94+
95+
## Usage
96+
97+
1. **Set up Triton Server** with the spleen segmentation model, listening at localhost:8000 in this example
98+
2. **Configure environment** variables pointing to your Triton server
99+
3. **Prepare input data** in DICOM format
100+
4. **Run the application**:
101+
```bash
102+
python ai_remote_infer_app
103+
```
104+
105+
## Input Requirements
106+
107+
- **DICOM CT series** containing abdominal scans
108+
- **Series selection criteria**: PRIMARY/ORIGINAL CT images
109+
- **Image preprocessing**: Automatic resampling to 1.5x1.5x2.9mm spacing
110+
111+
## Output
112+
113+
The application generates:
114+
- **DICOM Segmentation** files with spleen masks
115+
- **STL mesh** files for 3D visualization
116+
- **Intermediate NIfTI** files for debugging (optional)
117+
118+
## Model Specifications
119+
120+
- **Architecture**: 3D PyTorch model optimized for spleen segmentation
121+
- **Input size**: 96×96×96 voxels
122+
- **Output**: 2-class segmentation (background + spleen)
123+
- **Inference method**: Sliding window with 60% overlap
124+
- **Batch size**: Configurable (default: 4)
125+
126+
## Notes
127+
128+
- The application demonstrates **remote inference patterns** suitable for production deployments
129+
- **Model versioning** is handled server-side through Triton's version policies
130+
- **Dynamic batching** optimizes throughput for multiple concurrent requests
131+
- **GPU acceleration** is configured but can be adjusted based on available hardware
132+
133+
## Dependencies
134+
135+
- MONAI Deploy SDK
136+
- Triton Inference Client libraries
137+
- PyDICOM for DICOM handling
138+
- MONAI transforms for preprocessing

0 commit comments

Comments
 (0)