Skip to content

feat(compute): Extract OpenCL compute infrastructure from ART#3

Merged
Hellblazer merged 16 commits intomainfrom
feature/opencl-compute-infrastructure
Dec 29, 2025
Merged

feat(compute): Extract OpenCL compute infrastructure from ART#3
Hellblazer merged 16 commits intomainfrom
feature/opencl-compute-infrastructure

Conversation

@Hellblazer
Copy link
Owner

Summary

Extract mature, production-ready GPU compute infrastructure from ART repository into gpu-support framework for reuse by ART, Luciferase, and future projects.

Components Extracted

Core Interfaces (com.hellblazer.luciferase.resource.compute):

  • GPUBackend - Backend enum with availability detection (Metal, OpenCL, CPU)
  • BackendSelector - Automatic backend selection with CI detection
  • ComputeKernel - Unified kernel interface
  • GPUBuffer - Buffer interface
  • GPUErrorClassifier - Programming vs recoverable error classification
  • KernelLoader - Kernel source loading with caching

OpenCL Implementation (com.hellblazer.luciferase.resource.compute.opencl):

  • OpenCLContext - Singleton context manager with reference counting
  • OpenCLBuffer - GPU buffer with RAII lifecycle
  • OpenCLKernel - Kernel compilation and execution

Key Features

  • Singleton OpenCL context (avoids macOS driver crashes)
  • Threshold-based GPU/CPU execution selection
  • Graceful CI environment handling (tests skip without OpenCL)
  • Dual env var support (GPU_BACKEND/ART_GPU_BACKEND)
  • Event-based async kernel execution

Test Coverage

  • 263 tests in resource module
  • Integration tests for full compute workflow
  • All tests skip gracefully in CI without OpenCL

Test plan

  • All resource module tests pass locally (263 tests)
  • OpenCL tests skip in CI environment
  • macOS SIGSEGV crash fixed in test pattern
  • CI workflow passes on PR

- Add .pm/ project management infrastructure
  - CONTINUATION.md for session resumption
  - METHODOLOGY.md for TDD workflow
  - CONTEXT_PROTOCOL.md for agent handoffs
- Add .beads/ for issue tracking with beads
- Add AGENTS.md with bd commands reference
- Create bead hierarchy for 4-phase OpenCL extraction:
  - Phase 1: Core interfaces (GPUBuffer, ComputeKernel)
  - Phase 2: OpenCL implementation
  - Phase 3: Utilities and stubs
  - Phase 4: ART migration

Plan v2 audited and approved (92% confidence GO).
See ChromaDB: plan::gpu-support::art-opencl-extraction::v2
Extract portable compute API layer (Layer 2 per architecture decision):

- GPUBuffer: Host-device memory transfer interface
- ComputeKernel: Unified kernel compilation/execution interface
  - BufferAccess enum for kernel argument modes
  - KernelCompilationException, KernelExecutionException
- GPUBackend: Enum for METAL, OPENCL, CPU_FALLBACK with priorities
- GPUErrorClassifier: Programming vs recoverable error classification
  - OpenCL error code extraction from exception messages
  - Fixed self-referencing cause infinite loop (improvement over ART)

All interfaces are backend-agnostic. OpenCL implementations (Layer 3)
will wrap existing CLKernelHandle/CLBufferHandle (Layer 1) in Phase 2.

Beads closed: e63, ad2, kdp, ipz, 6e9
See: plan::gpu-support::art-opencl-extraction::v2
- Extract OpenCLContext singleton with reference counting and testReset()
- Extract OpenCLBuffer implementing GPUBuffer interface
- Add GPUBackend.isAvailable() with cached Metal/OpenCL detection
- Remove CL.create() calls to avoid macOS SIGSEGV in forked JVMs
- Add dual property name support (gpu.disable, luciferase.gpu.disable)

Beads: gij, 9go closed; ilr in progress

Note: OpenCLBufferTest has macOS driver crash - needs debugging
Replace JUnit Assumptions.assumeTrue() pattern with simple early-return
checks in each test method. The Assumptions pattern caused OpenCL driver
state issues in Maven Surefire forked JVM processes.

- Simplified @BeforeAll to just detect OpenCL availability
- Use `if (!openCLAvailable) return;` instead of @beforeeach assumptions
- Fix IndexOutOfBounds in FloatBuffer test (remove unnecessary flip())
- All 10 OpenCLBuffer tests pass

Closes: gpu-support-ilr
Implement OpenCLKernel as ComputeKernel interface for GPU compute:
- Kernel compilation with build log on failure
- Buffer, float, int, and local memory argument binding
- 1D/2D/3D execution with optional local work sizes
- Async execution with event-based synchronization
- Uses OpenCLContext singleton pattern

16 tests covering:
- Compilation lifecycle (compile, double-compile, invalid source)
- Argument setting (buffer, scalar, before compile)
- Execution (vectorAdd, scale, 2D/3D work sizes)
- Resource lifecycle (close, double-close, ops after close)

Closes: gpu-support-6pw
Automatic GPU backend selection with priority-based fallback:
- Metal (priority 100, macOS only)
- OpenCL (priority 90, cross-platform)
- CPU fallback (priority 10, always available)

Environment variable support:
- GPU_BACKEND / GPU_DISABLE (new generic names)
- ART_GPU_BACKEND / ART_GPU_DISABLE (legacy, deprecated)

CI environment auto-detection (GitHub Actions, Jenkins, etc).

17 tests covering selection logic, caching, and environment info.

Closes: gpu-support-cbr
Full compute workflow tests:
- vectorAdd: context → buffers → kernel → execute → read
- SAXPY: scalar float arguments (result = a*x + y)
- 2D execution: proper 2D kernel indexing
- Large data: 64K elements
- Multiple executions: iterative kernel runs
- Resource cleanup: try-with-resources pattern

9 integration tests verifying complete OpenCL compute pipeline.

Closes: gpu-support-97u
Kernel loading utility with caching and convention support:
- loadOpenCLKernel(name) → kernels/opencl/{name}.cl
- loadMetalKernel(name) → kernels/metal/{name}.metal
- loadTestKernel(name) → kernels/{name}.cl (flat structure)
- ConcurrentHashMap caching for repeated loads
- kernelExists() for resource checking

Package documentation with usage examples and conventions.

12 tests covering loading, caching, and error handling.

Closes: gpu-support-0y1
Add ComputeService facade providing simplified GPU compute with automatic
CPU fallback. Includes built-in operations for vector math (vectorAdd,
saxpy, scale) and reductions (sum, min, max), plus custom operation
support via createOperation().

New resources:
- kernels/opencl/vector_add.cl - element-wise vector addition
- kernels/opencl/saxpy.cl - SAXPY operations
- kernels/opencl/reduce.cl - parallel sum/min/max reductions
- kernels/opencl/transform.cl - scale, clamp, abs, square, sqrt

Tests:
- ComputeServiceTest: 16 tests demonstrating API usage
- ComputeServiceStressTest: 23 tests for edge cases, large arrays,
  concurrent access, and memory pressure

Total: 302 tests pass in resource module
COMPUTE.md covers:
- Basic operations (vectorAdd, saxpy, scale, sum, min, max)
- Custom kernel writing
- Low-level API usage
- Configuration (env vars, backend selection)
- Error handling
- Performance notes
- Thread safety

Examples in examples/ package:
- VectorMathExample: built-in operations
- CustomKernelExample: writing custom kernels
- PerformanceExample: GPU vs CPU timing
- LowLevelExample: direct buffer/kernel control
…yTest

Bug: stack.ints(N) allocates N zero-filled ints, not an int containing N.
The kernel received size=0, producing all zeros.

Fix: Use clSetKernelArg1i/clSetKernelArg1p for scalar and pointer args.
@Hellblazer Hellblazer merged commit f107a71 into main Dec 29, 2025
1 check passed
@Hellblazer Hellblazer deleted the feature/opencl-compute-infrastructure branch December 29, 2025 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant