feat(compute): Extract OpenCL compute infrastructure from ART#3
Merged
Hellblazer merged 16 commits intomainfrom Dec 29, 2025
Merged
feat(compute): Extract OpenCL compute infrastructure from ART#3Hellblazer merged 16 commits intomainfrom
Hellblazer merged 16 commits intomainfrom
Conversation
- Add .pm/ project management infrastructure - CONTINUATION.md for session resumption - METHODOLOGY.md for TDD workflow - CONTEXT_PROTOCOL.md for agent handoffs - Add .beads/ for issue tracking with beads - Add AGENTS.md with bd commands reference - Create bead hierarchy for 4-phase OpenCL extraction: - Phase 1: Core interfaces (GPUBuffer, ComputeKernel) - Phase 2: OpenCL implementation - Phase 3: Utilities and stubs - Phase 4: ART migration Plan v2 audited and approved (92% confidence GO). See ChromaDB: plan::gpu-support::art-opencl-extraction::v2
Extract portable compute API layer (Layer 2 per architecture decision): - GPUBuffer: Host-device memory transfer interface - ComputeKernel: Unified kernel compilation/execution interface - BufferAccess enum for kernel argument modes - KernelCompilationException, KernelExecutionException - GPUBackend: Enum for METAL, OPENCL, CPU_FALLBACK with priorities - GPUErrorClassifier: Programming vs recoverable error classification - OpenCL error code extraction from exception messages - Fixed self-referencing cause infinite loop (improvement over ART) All interfaces are backend-agnostic. OpenCL implementations (Layer 3) will wrap existing CLKernelHandle/CLBufferHandle (Layer 1) in Phase 2. Beads closed: e63, ad2, kdp, ipz, 6e9 See: plan::gpu-support::art-opencl-extraction::v2
- Extract OpenCLContext singleton with reference counting and testReset() - Extract OpenCLBuffer implementing GPUBuffer interface - Add GPUBackend.isAvailable() with cached Metal/OpenCL detection - Remove CL.create() calls to avoid macOS SIGSEGV in forked JVMs - Add dual property name support (gpu.disable, luciferase.gpu.disable) Beads: gij, 9go closed; ilr in progress Note: OpenCLBufferTest has macOS driver crash - needs debugging
Replace JUnit Assumptions.assumeTrue() pattern with simple early-return checks in each test method. The Assumptions pattern caused OpenCL driver state issues in Maven Surefire forked JVM processes. - Simplified @BeforeAll to just detect OpenCL availability - Use `if (!openCLAvailable) return;` instead of @beforeeach assumptions - Fix IndexOutOfBounds in FloatBuffer test (remove unnecessary flip()) - All 10 OpenCLBuffer tests pass Closes: gpu-support-ilr
Implement OpenCLKernel as ComputeKernel interface for GPU compute: - Kernel compilation with build log on failure - Buffer, float, int, and local memory argument binding - 1D/2D/3D execution with optional local work sizes - Async execution with event-based synchronization - Uses OpenCLContext singleton pattern 16 tests covering: - Compilation lifecycle (compile, double-compile, invalid source) - Argument setting (buffer, scalar, before compile) - Execution (vectorAdd, scale, 2D/3D work sizes) - Resource lifecycle (close, double-close, ops after close) Closes: gpu-support-6pw
Automatic GPU backend selection with priority-based fallback: - Metal (priority 100, macOS only) - OpenCL (priority 90, cross-platform) - CPU fallback (priority 10, always available) Environment variable support: - GPU_BACKEND / GPU_DISABLE (new generic names) - ART_GPU_BACKEND / ART_GPU_DISABLE (legacy, deprecated) CI environment auto-detection (GitHub Actions, Jenkins, etc). 17 tests covering selection logic, caching, and environment info. Closes: gpu-support-cbr
Full compute workflow tests: - vectorAdd: context → buffers → kernel → execute → read - SAXPY: scalar float arguments (result = a*x + y) - 2D execution: proper 2D kernel indexing - Large data: 64K elements - Multiple executions: iterative kernel runs - Resource cleanup: try-with-resources pattern 9 integration tests verifying complete OpenCL compute pipeline. Closes: gpu-support-97u
Kernel loading utility with caching and convention support:
- loadOpenCLKernel(name) → kernels/opencl/{name}.cl
- loadMetalKernel(name) → kernels/metal/{name}.metal
- loadTestKernel(name) → kernels/{name}.cl (flat structure)
- ConcurrentHashMap caching for repeated loads
- kernelExists() for resource checking
Package documentation with usage examples and conventions.
12 tests covering loading, caching, and error handling.
Closes: gpu-support-0y1
Add ComputeService facade providing simplified GPU compute with automatic CPU fallback. Includes built-in operations for vector math (vectorAdd, saxpy, scale) and reductions (sum, min, max), plus custom operation support via createOperation(). New resources: - kernels/opencl/vector_add.cl - element-wise vector addition - kernels/opencl/saxpy.cl - SAXPY operations - kernels/opencl/reduce.cl - parallel sum/min/max reductions - kernels/opencl/transform.cl - scale, clamp, abs, square, sqrt Tests: - ComputeServiceTest: 16 tests demonstrating API usage - ComputeServiceStressTest: 23 tests for edge cases, large arrays, concurrent access, and memory pressure Total: 302 tests pass in resource module
COMPUTE.md covers: - Basic operations (vectorAdd, saxpy, scale, sum, min, max) - Custom kernel writing - Low-level API usage - Configuration (env vars, backend selection) - Error handling - Performance notes - Thread safety Examples in examples/ package: - VectorMathExample: built-in operations - CustomKernelExample: writing custom kernels - PerformanceExample: GPU vs CPU timing - LowLevelExample: direct buffer/kernel control
…yTest Bug: stack.ints(N) allocates N zero-filled ints, not an int containing N. The kernel received size=0, producing all zeros. Fix: Use clSetKernelArg1i/clSetKernelArg1p for scalar and pointer args.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Extract mature, production-ready GPU compute infrastructure from ART repository into gpu-support framework for reuse by ART, Luciferase, and future projects.
Components Extracted
Core Interfaces (
com.hellblazer.luciferase.resource.compute):GPUBackend- Backend enum with availability detection (Metal, OpenCL, CPU)BackendSelector- Automatic backend selection with CI detectionComputeKernel- Unified kernel interfaceGPUBuffer- Buffer interfaceGPUErrorClassifier- Programming vs recoverable error classificationKernelLoader- Kernel source loading with cachingOpenCL Implementation (
com.hellblazer.luciferase.resource.compute.opencl):OpenCLContext- Singleton context manager with reference countingOpenCLBuffer- GPU buffer with RAII lifecycleOpenCLKernel- Kernel compilation and executionKey Features
GPU_BACKEND/ART_GPU_BACKEND)Test Coverage
Test plan