tychedelia
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎CUDA_ARCHITECTURE.md‎
Lines changed: 297 additions & 0 deletions b/‎CUDA_ARCHITECTURE.md‎
Lines changed: 297 additions & 0 deletions
@@ -13,3 +13,4 @@ cmake-build*
 output.plist
 
 venv/
+vendor/
@@ -0,0 +1,297 @@
+# TouchDesigner CUDA Integration Architecture
+
+## Overview
+
+This document outlines the architecture for adding CUDA support to the td-rs Rust framework for TouchDesigner plugins, based on analysis of the existing C++ CudaTOP sample and the cudarc Rust CUDA library.
+
+## TouchDesigner CUDA Integration Pattern
+
+### Core Execution Flow
+
+1. **Plugin Declaration**: Plugin must declare `TOP_ExecuteMode::CUDA` in plugin info
+2. **Resource Acquisition**: Get `OP_CUDAArrayInfo*` from TouchDesigner before CUDA operations
+3. **Critical Sequence**: 
+   ```
+   createCUDAArray() → beginCUDAOperations() → kernel work → endCUDAOperations()
+   ```
+4. **Memory Management**: TouchDesigner owns `cudaArray*`, plugin works with surfaces
+
+### Key TouchDesigner Types
+
+```cpp
+// Resource info - initially cudaArray is nullptr
+class OP_CUDAArrayInfo {
+    OP_TextureDesc textureDesc;        // Resolution, format, dimension
+    cudaArray* cudaArray = nullptr;    // Filled by beginCUDAOperations()
+};
+
+// Stream specification for async operations  
+class OP_CUDAAcquireInfo {
+    cudaStream_t stream;
+};
+
+// Output specification
+class TOP_CUDAOutputInfo {
+    cudaStream_t stream;
+    OP_TextureDesc textureDesc;
+    uint32_t colorBufferIndex = 0;     // Multi-output support
+};
+```
+
+### Surface Object Pattern
+
+TouchDesigner uses surface objects for efficient texture access in kernels:
+
+```cpp
+// Dynamic surface management - reuse if array matches
+static void setupCudaSurface(cudaSurfaceObject_t* surface, cudaArray_t array) {
+    if (*surface) {
+        cudaResourceDesc desc;
+        cudaGetSurfaceObjectResourceDesc(&desc, *surface);
+        if (desc.res.array.array != array) {
+            cudaDestroySurfaceObject(*surface);
+            *surface = 0;
+        }
+    }
+    
+    if (!*surface) {
+        cudaResourceDesc desc;
+        desc.resType = cudaResourceTypeArray;
+        desc.res.array.array = array;
+        cudaCreateSurfaceObject(surface, &desc);
+    }
+}
+```
+
+## cudarc API Analysis
+
+### Available APIs ✅
+
+cudarc provides complete FFI bindings for surface objects:
+
+```rust
+// Core types
+pub type cudaSurfaceObject_t = ::core::ffi::c_ulonglong;
+
+// Functions  
+pub unsafe fn cudaCreateSurfaceObject(
+    pSurfObject: *mut cudaSurfaceObject_t,
+    pResDesc: *const cudaResourceDesc,
+) -> cudaError_t;
+
+pub unsafe fn cudaDestroySurfaceObject(
+    surfObject: cudaSurfaceObject_t
+) -> cudaError_t;
+
+// Resource management
+pub struct cudaResourceDesc { /* ... */ };
+```
+
+### Missing Safe Abstractions ❌
+
+- No high-level wrappers in `driver::safe` module
+- No integration with `CudaStream`, `CudaContext`
+- Manual memory/lifecycle management required
+
+## Rust Implementation Strategy
+
+### 1. Safe Surface Object Wrapper
+
+Create a safe abstraction over surface objects:
+
+```rust
+use cudarc::runtime::sys;
+
+pub struct CudaSurface {
+    surface: sys::cudaSurfaceObject_t,
+    _ctx: Arc<CudaContext>,
+}
+
+impl CudaSurface {
+    /// Create surface from external cudaArray* (TouchDesigner-owned)
+    pub unsafe fn from_external_array(
+        ctx: Arc<CudaContext>, 
+        array: *mut sys::cudaArray
+    ) -> Result<Self, CudaError> {
+        let mut surface = 0;
+        let mut desc = sys::cudaResourceDesc {
+            resType: sys::cudaResourceType::cudaResourceTypeArray,
+            res: sys::cudaResourceDesc__bindgen_ty_1 {
+                array: sys::cudaResourceDesc__bindgen_ty_1__bindgen_ty_1 {
+                    array,
+                }
+            },
+        };
+        
+        sys::cudaCreateSurfaceObject(&mut surface, &desc)?;
+        Ok(CudaSurface { surface, _ctx: ctx })
+    }
+    
+    pub fn handle(&self) -> sys::cudaSurfaceObject_t {
+        self.surface
+    }
+}
+
+impl Drop for CudaSurface {
+    fn drop(&mut self) {
+        unsafe { sys::cudaDestroySurfaceObject(self.surface); }
+    }
+}
+```
+
+### 2. Surface Cache for Performance
+
+Implement dynamic surface reuse pattern:
+
+```rust
+pub struct SurfaceCache {
+    surfaces: HashMap<*mut sys::cudaArray, CudaSurface>,
+    ctx: Arc<CudaContext>,
+}
+
+impl SurfaceCache {
+    pub fn get_or_create(&mut self, array: *mut sys::cudaArray) -> Result<&CudaSurface, CudaError> {
+        if !self.surfaces.contains_key(&array) {
+            let surface = unsafe { CudaSurface::from_external_array(self.ctx.clone(), array)? };
+            self.surfaces.insert(array, surface);
+        }
+        Ok(self.surfaces.get(&array).unwrap())
+    }
+    
+    pub fn cleanup_invalid(&mut self, valid_arrays: &[*mut sys::cudaArray]) {
+        self.surfaces.retain(|&k, _| valid_arrays.contains(&k));
+    }
+}
+```
+
+### 3. CUDA TOP Trait Extension
+
+Extend the TOP trait to support CUDA execution:
+
+```rust
+pub trait CudaTop: Top {
+    /// Execute CUDA kernel operations
+    fn execute_cuda(
+        &mut self,
+        output: &CudaTopOutput,
+        inputs: &CudaTopInputs, 
+        params: &Self::Params,
+    ) -> Result<(), CudaError>;
+    
+    /// Get required CUDA stream (default: context default stream)
+    fn cuda_stream(&self) -> Option<&CudaStream> { None }
+}
+
+pub struct CudaTopOutput {
+    pub primary: CudaSurface,
+    pub auxiliary: Vec<CudaSurface>,
+    pub stream: Arc<CudaStream>,
+}
+
+pub struct CudaTopInputs {
+    pub inputs: Vec<Option<CudaSurface>>,
+    pub stream: Arc<CudaStream>,
+}
+```
+
+### 4. Integration with td-rs Framework
+
+Modify the existing TOP infrastructure:
+
+```rust
+// In td-rs-top/src/lib.rs
+impl OpInfo for MyCudaPlugin {
+    fn op_type(&self) -> &'static str { "Mycudasample" }
+    fn execute_mode(&self) -> TopExecuteMode {
+        TopExecuteMode::CUDA  // ← Declares CUDA capability
+    }
+}
+
+impl Top for MyCudaPlugin {
+    fn execute(&mut self, output: &TopOutput, inputs: &OpInputs) -> Result<(), OpError> {
+        // Bridge to CUDA execution
+        let cuda_output = self.setup_cuda_output(output)?;
+        let cuda_inputs = self.setup_cuda_inputs(inputs)?;
+        self.execute_cuda(&cuda_output, &cuda_inputs, &self.params)
+    }
+}
+
+impl CudaTop for MyCudaPlugin {
+    fn execute_cuda(
+        &mut self,
+        output: &CudaTopOutput,
+        inputs: &CudaTopInputs,
+        params: &Self::Params,
+    ) -> Result<(), CudaError> {
+        // Launch kernels using cudarc
+        let config = LaunchConfig::for_num_elems(width * height);
+        unsafe {
+            my_kernel_launch(
+                output.primary.handle(),
+                inputs.inputs[0].as_ref().map(|s| s.handle()).unwrap_or(0),
+                width, height,
+                &output.stream,
+                config
+            )?;
+        }
+        Ok(())
+    }
+}
+```
+
+### 5. Build System Integration
+
+Add CUDA compilation support to td-rs build system:
+
+```toml
+# Plugin Cargo.toml
+[dependencies]
+cudarc = { path = "../../vendor/cudarc", features = ["runtime"] }
+td-rs-top = { path = "../../td-rs-top", features = ["cuda"] }
+
+[package.metadata.td-rs]
+type = "top"
+cuda = true  # Enable CUDA compilation
+
+[[package.metadata.td-rs.kernels]]
+source = "src/kernels.cu"
+include_dirs = ["src/include"]
+```
+
+## Implementation Challenges & Solutions
+
+### 1. External Memory Safety
+**Challenge**: TouchDesigner owns `cudaArray*`, Rust must not drop it  
+**Solution**: Use `CudaSurface::from_external_array()` with proper lifetime management
+
+### 2. Synchronization
+**Challenge**: Coordinate beginCUDAOperations/endCUDAOperations lifecycle  
+**Solution**: Integrate into TOP trait execution flow, handle in bridge layer
+
+### 3. Multi-Stream Coordination  
+**Challenge**: TouchDesigner provides specific streams for async operations  
+**Solution**: Accept external streams in `CudaTopOutput`, use cudarc stream sync primitives
+
+### 4. Error Propagation
+**Challenge**: Map CUDA errors to TouchDesigner error system  
+**Solution**: Convert `CudaError` to `OpError` in bridge layer
+
+### 5. Dimension Support
+**Challenge**: Support 2D/3D/Cube/Array texture types  
+**Solution**: Pass `OP_TexDim` to kernels, dispatch appropriate kernel variants
+
+## Performance Considerations
+
+1. **Surface Reuse**: Cache surface objects to avoid recreation overhead
+2. **Stream Management**: Use TouchDesigner-provided streams for optimal sync
+3. **Kernel Compilation**: Pre-compile kernels at build time when possible
+4. **Memory Access**: Prefer surface objects over direct array access for texture ops
+
+## Future Extensions
+
+1. **Graph API**: Support CUDA graphs for complex operation chains
+2. **Texture Arrays**: Enhanced support for layered textures
+3. **Interop**: OpenGL-CUDA interop for mixed rendering
+4. **Compute Shaders**: Alternative compute backends for portability
+
+This architecture provides a safe, efficient foundation for CUDA integration while maintaining compatibility with TouchDesigner's execution model and the existing td-rs framework patterns.
Original file line number	Diff line number	Diff line change
`@@ -13,3 +13,4 @@ cmake-build*`
`13`	`13`	`output.plist`
`14`	`14`
`15`	`15`	`venv/`
	`16`	`+vendor/`