Skip to content

Commit b11b8f6

Browse files
committed
Add readme
1 parent d7686d4 commit b11b8f6

File tree

2 files changed

+117
-249
lines changed

2 files changed

+117
-249
lines changed

extension/llm/runner/README.md

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,123 @@ int main() {
164164
}
165165
```
166166

167+
## Python API
168+
169+
The LLM Runner framework also provides Python bindings for easy integration with Python applications. The Python API mirrors the C++ interface while providing Pythonic convenience features.
170+
171+
### Installation
172+
173+
Build the Python bindings as part of the ExecuTorch build:
174+
175+
```bash
176+
# Build with Python bindings enabled
177+
cmake -DPYTHON_EXECUTABLE=$(which python3) \
178+
-DEXECUTORCH_BUILD_EXTENSION_LLM=ON \
179+
-DEXECUTORCH_BUILD_PYTHON_BINDINGS=ON \
180+
..
181+
make -j8 _llm_runner
182+
```
183+
184+
### Quick Start - Python
185+
186+
```python
187+
import _llm_runner
188+
import numpy as np
189+
190+
# Create a multimodal runner
191+
runner = _llm_runner.MultimodalRunner(
192+
model_path="/path/to/model.pte",
193+
tokenizer_path="/path/to/tokenizer.bin"
194+
)
195+
196+
# Create multimodal inputs
197+
inputs = []
198+
199+
# Add text input
200+
inputs.append(_llm_runner.make_text_input("Describe this image:"))
201+
202+
# Add image input from numpy array
203+
image_array = np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8)
204+
inputs.append(_llm_runner.make_image_input(image_array))
205+
206+
# Configure generation
207+
config = _llm_runner.GenerationConfig()
208+
config.max_new_tokens = 100
209+
config.temperature = 0.7
210+
config.echo = False
211+
212+
# Generate text with callback
213+
def token_callback(token: str):
214+
print(token, end='', flush=True)
215+
216+
def stats_callback(stats):
217+
print(f"\nGenerated {stats.num_generated_tokens} tokens")
218+
print(f"Tokens/sec: {stats.num_generated_tokens * 1000 / (stats.inference_end_ms - stats.inference_start_ms):.1f}")
219+
220+
# Run generation
221+
runner.generate(inputs, config, token_callback, stats_callback)
222+
223+
# Or get complete text result
224+
result = runner.generate_text(inputs, config)
225+
print(f"Generated text: {result}")
226+
```
227+
228+
### Python API Features
229+
230+
- **Type hints**: Full type annotations with `.pyi` stub files for IDE support
231+
- **NumPy integration**: Direct support for numpy arrays as image inputs
232+
- **Callbacks**: Optional token and statistics callbacks for streaming generation
233+
- **Exception handling**: Pythonic error handling with RuntimeError for failures
234+
- **Memory management**: Automatic resource cleanup with Python garbage collection
235+
236+
### Python API Classes
237+
238+
#### GenerationConfig
239+
```python
240+
config = _llm_runner.GenerationConfig()
241+
config.max_new_tokens = 50 # Maximum tokens to generate
242+
config.temperature = 0.8 # Sampling temperature
243+
config.echo = True # Echo input prompt
244+
config.seq_len = 512 # Maximum sequence length
245+
config.num_bos = 1 # Number of BOS tokens
246+
config.num_eos = 1 # Number of EOS tokens
247+
```
248+
249+
#### MultimodalInput
250+
```python
251+
# Text input
252+
text_input = _llm_runner.MultimodalInput("Hello, world!")
253+
# Or using helper
254+
text_input = _llm_runner.make_text_input("Hello, world!")
255+
256+
# Image input
257+
image = _llm_runner.Image()
258+
image.data = [255] * (224 * 224 * 3) # RGB data
259+
image.width = 224
260+
image.height = 224
261+
image.channels = 3
262+
image_input = _llm_runner.MultimodalInput(image)
263+
264+
# Or from numpy array
265+
img_array = np.ones((224, 224, 3), dtype=np.uint8) * 128
266+
image_input = _llm_runner.make_image_input(img_array)
267+
```
268+
269+
#### Stats
270+
```python
271+
# Access timing and performance statistics
272+
stats = _llm_runner.Stats()
273+
print(f"Model load time: {stats.model_load_end_ms - stats.model_load_start_ms}ms")
274+
print(f"Inference time: {stats.inference_end_ms - stats.inference_start_ms}ms")
275+
print(f"Tokens generated: {stats.num_generated_tokens}")
276+
print(f"Prompt tokens: {stats.num_prompt_tokens}")
277+
278+
# JSON export
279+
json_str = stats.to_json_string()
280+
```
281+
282+
For detailed Python API documentation and examples, see [README_PYTHON_BINDINGS.md](README_PYTHON_BINDINGS.md).
283+
167284
## Core Components
168285

169286
### Component Architecture

extension/llm/runner/README_PYTHON_BINDINGS.md

Lines changed: 0 additions & 249 deletions
This file was deleted.

0 commit comments

Comments
 (0)