Skip to content

Commit ef6f203

Browse files
committed
Update
1 parent 43b50ff commit ef6f203

File tree

6 files changed

+1301
-0
lines changed

6 files changed

+1301
-0
lines changed

extension/llm/runner/CMakeLists.txt

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,3 +79,46 @@ install(
7979
if(BUILD_TESTING)
8080
add_subdirectory(test)
8181
endif()
82+
83+
# Python bindings for MultimodalRunner
84+
if(EXECUTORCH_BUILD_PYBIND)
85+
# Find pybind11
86+
find_package(pybind11 REQUIRED)
87+
88+
# Create the Python extension module for LLM runners
89+
pybind11_add_module(
90+
_llm_runner
91+
${CMAKE_CURRENT_SOURCE_DIR}/pybindings.cpp
92+
)
93+
94+
# Link with the extension_llm_runner library and its dependencies
95+
target_link_libraries(
96+
_llm_runner
97+
PRIVATE
98+
extension_llm_runner
99+
executorch_core
100+
extension_module
101+
extension_tensor
102+
tokenizers::tokenizers
103+
)
104+
105+
# Set properties for the Python extension
106+
set_target_properties(
107+
_llm_runner
108+
PROPERTIES
109+
POSITION_INDEPENDENT_CODE ON
110+
CXX_VISIBILITY_PRESET "hidden"
111+
INTERPROCEDURAL_OPTIMIZATION TRUE
112+
PREFIX "${PYTHON_MODULE_PREFIX}"
113+
SUFFIX "${PYTHON_MODULE_SUFFIX}"
114+
)
115+
116+
# Add include directories
117+
target_include_directories(
118+
_llm_runner
119+
PRIVATE
120+
${_common_include_directories}
121+
${CMAKE_CURRENT_SOURCE_DIR}
122+
${CMAKE_CURRENT_SOURCE_DIR}/../sampler
123+
)
124+
endif()
Lines changed: 249 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,249 @@
1+
# Python Bindings for MultimodalRunner
2+
3+
## Overview
4+
5+
This project provides Python bindings for the ExecuTorch MultimodalRunner, enabling Python developers to easily use the multimodal LLM runner for processing mixed inputs (text, images, audio) and generating text outputs.
6+
7+
## Architecture
8+
9+
The MultimodalRunner is designed for Large Language Models that can process multimodal inputs and generate text outputs. It supports models like:
10+
- LLaVA (vision-language models)
11+
- CLIP-based models
12+
- Speech-to-text models
13+
- Other multimodal transformers
14+
15+
### Key Components
16+
17+
1. **MultimodalRunner** - Main runner class for multimodal inference
18+
2. **MultimodalInput** - Handles different input modalities (text, image, audio)
19+
3. **GenerationConfig** - Configuration for text generation parameters
20+
4. **Stats** - Performance monitoring and statistics
21+
5. **Tokenizer** - Text tokenization and decoding
22+
23+
## Project Structure
24+
25+
```
26+
extension/llm/runner/
27+
├── multimodal_runner_pybindings.cpp # Python bindings implementation (NEW)
28+
├── __init__.py # Python package initialization (NEW)
29+
├── multimodal_runner.py # Python wrapper classes (NEW)
30+
├── utils.py # Utility functions (NEW)
31+
├── CMakeLists.txt # Existing - update to include Python bindings
32+
└── test/
33+
├── test_multimodal_runner.py # Unit tests for Python bindings (NEW)
34+
└── test_generation.py # Generation tests (NEW)
35+
└── [existing test files] # Existing C++ tests remain here
36+
```
37+
38+
Note: We'll reuse the root-level `setup.py` and update the existing `CMakeLists.txt` rather than creating new ones.
39+
40+
## Action Items
41+
42+
### 1. Core Implementation Tasks
43+
44+
#### High Priority
45+
- [x] ~~**Create Python bindings file** (`multimodal_runner_pybindings.cpp`)~~
46+
- [x] ~~Bind MultimodalRunner class~~
47+
- [x] ~~Bind MultimodalInput and helper functions~~
48+
- [x] ~~Bind GenerationConfig struct~~
49+
- [x] ~~Bind Stats class for performance monitoring~~
50+
- [x] ~~Implement error handling and exception translation~~
51+
52+
#### Medium Priority
53+
- [x] ~~**Update existing CMakeLists.txt** in `extension/llm/runner/`~~
54+
- [x] ~~Add Python bindings target when EXECUTORCH_BUILD_PYBIND is enabled~~
55+
- [x] ~~Configure pybind11 integration~~
56+
- [x] ~~Link with extension_llm_runner library~~
57+
- [x] ~~Handle tokenizers dependency~~
58+
- [x] ~~Set up proper include paths~~
59+
60+
- [x] ~~**Update root-level setup.py**~~
61+
- [x] ~~Add multimodal_runner to the extensions list~~
62+
- [x] ~~Ensure proper build configuration~~
63+
- [x] ~~Handle platform-specific configurations~~
64+
65+
#### Low Priority
66+
- [x] ~~**Create Python wrapper files** in `extension/llm/runner/`~~
67+
- [x] ~~`__init__.py` - Package initialization~~
68+
- [x] ~~`multimodal_runner.py` - High-level Python API~~
69+
- [x] ~~`utils.py` - Utility functions for input preprocessing~~
70+
71+
### 2. Build System Integration
72+
73+
- [ ] **Integrate with main CMake build**
74+
- [ ] Add Python bindings compilation when EXECUTORCH_BUILD_PYBIND is enabled
75+
- [ ] Update extension/llm/runner/CMakeLists.txt to build multimodal_runner_pybindings.cpp
76+
- [ ] Ensure proper dependency resolution
77+
78+
- [ ] **Handle dependencies**
79+
- [ ] Link against existing tokenizers Python bindings
80+
- [ ] Ensure Module and other dependencies are available
81+
- [ ] Handle pybind11 version requirements
82+
83+
### 3. Input/Output Handling
84+
85+
- [ ] **Implement MultimodalInput Python bindings**
86+
- [ ] Support for text inputs
87+
- [ ] Support for image inputs (numpy arrays, PIL Images)
88+
- [ ] Support for audio inputs (if applicable)
89+
- [ ] Mixed input ordering support
90+
91+
- [ ] **Implement callbacks**
92+
- [ ] Token generation callback
93+
- [ ] Statistics callback
94+
- [ ] Progress reporting
95+
96+
### 4. Testing and Documentation
97+
98+
- [ ] **Create comprehensive tests**
99+
- [ ] Unit tests for bindings
100+
- [ ] Integration tests with sample models
101+
- [ ] Performance benchmarks
102+
- [ ] Memory leak tests
103+
104+
- [ ] **Write documentation**
105+
- [ ] API documentation with examples
106+
- [ ] Installation guide
107+
- [ ] Usage tutorials
108+
- [ ] Model compatibility guide
109+
110+
### 5. Example Scripts
111+
112+
- [ ] **Create example scripts**
113+
- [ ] Basic text generation
114+
- [ ] Image + text (vision-language) example
115+
- [ ] Batch processing example
116+
- [ ] Streaming generation example
117+
118+
## Installation Instructions
119+
120+
### Prerequisites
121+
122+
- Python >= 3.8
123+
- CMake >= 3.18
124+
- C++17 compatible compiler
125+
- PyTorch (for tensor operations)
126+
- pybind11 >= 2.6.0
127+
128+
### Building from Source
129+
130+
```bash
131+
# Clone the repository
132+
git clone https://github.com/pytorch/executorch.git
133+
cd executorch
134+
135+
# Install dependencies
136+
pip install -r requirements.txt
137+
138+
# Build with Python bindings enabled
139+
python setup.py install --cmake-args="-DEXECUTORCH_BUILD_PYBIND=ON"
140+
141+
# Or for development
142+
pip install -e . --config-settings editable_mode=compat
143+
```
144+
145+
### Running Tests
146+
147+
```bash
148+
# Run the multimodal runner Python tests
149+
python -m pytest extension/llm/runner/test/test_multimodal_runner.py -v
150+
```
151+
152+
## Usage Example
153+
154+
```python
155+
from executorch.extension.llm.runner import MultimodalRunner, GenerationConfig
156+
from executorch.extension.llm.runner.utils import make_text_input, make_image_input
157+
import numpy as np
158+
159+
# Initialize the runner
160+
runner = MultimodalRunner(
161+
model_path="path/to/model.pte",
162+
tokenizer_path="path/to/tokenizer.bin"
163+
)
164+
165+
# Create multimodal inputs
166+
image_array = np.random.rand(224, 224, 3) # Example image
167+
inputs = [
168+
make_text_input("Describe this image:"),
169+
make_image_input(image_array) # numpy array or PIL Image
170+
]
171+
172+
# Configure generation
173+
config = GenerationConfig(
174+
max_new_tokens=100,
175+
temperature=0.7,
176+
top_p=0.9
177+
)
178+
179+
# Generate text with callbacks
180+
def on_token(token):
181+
print(token, end='', flush=True)
182+
183+
def on_stats(stats):
184+
print(f"\nTokens/sec: {stats.tokens_per_second:.2f}")
185+
186+
runner.generate(inputs, config, token_callback=on_token, stats_callback=on_stats)
187+
188+
# Or simpler usage without callbacks
189+
response = runner.generate_text(inputs, config)
190+
print(response)
191+
```
192+
193+
## Technical Considerations
194+
195+
### Memory Management
196+
- Python bindings should properly handle memory ownership
197+
- Use shared_ptr/unique_ptr appropriately
198+
- Implement proper cleanup in destructors
199+
200+
### Threading and GIL
201+
- Consider GIL release during long-running operations
202+
- Ensure thread safety for callbacks
203+
- Handle Python exceptions in C++ code
204+
205+
### Performance
206+
- Minimize data copying between Python and C++
207+
- Use move semantics where possible
208+
- Consider zero-copy tensor operations
209+
210+
## Dependencies
211+
212+
### Required
213+
- executorch core libraries
214+
- extension_llm_runner
215+
- tokenizers library
216+
- pybind11
217+
218+
### Optional
219+
- numpy (for array handling)
220+
- PIL/Pillow (for image processing)
221+
- torch (for tensor operations)
222+
223+
## Contributing
224+
225+
Please follow the ExecuTorch contribution guidelines. Key points:
226+
- Code should be formatted with clang-format
227+
- Python code should follow PEP 8
228+
- Add comprehensive tests for new features
229+
- Update documentation as needed
230+
231+
## License
232+
233+
This project is licensed under the BSD-style license found in the LICENSE file in the root directory of the ExecuTorch repository.
234+
235+
## Next Steps
236+
237+
1. **Review and approve this plan** with the team
238+
2. **Start with core bindings** implementation
239+
3. **Test with existing models** (LLaVA, etc.)
240+
4. **Gather feedback** from early users
241+
5. **Iterate and improve** based on usage patterns
242+
243+
## Questions for Discussion
244+
245+
1. Should we support async generation?
246+
2. What level of integration with PyTorch tensors is needed?
247+
3. Should we provide pre-built wheels or source-only distribution?
248+
4. How should we handle model loading and caching?
249+
5. What additional utilities would be helpful for users?

0 commit comments

Comments
 (0)