Skip to content

6 Bug fixes for ILGPU right nowΒ #1545

@LostBeard

Description

@LostBeard

Thank you again for ILGPU. My SpawnDev.ILGPU library is now working quite well on desktop and in the browser with Blazor WASM. Since my library is nearing completion, I spent some time yesterday working on upstream issues in this repo. I would normally submit pull requests, but because past PRs have gone ignored, I don't want to over-invest effort. Instead, I’ll post my findings here so you can use them as you see fit. πŸ––

Below is from a SpawnDev.ILGPU internal report based on findings and fixes using unit testing to verify.

Upstream ILGPU Issues Analysis

Analysis of open issues in the original ILGPU repo to determine which bugs are inherited and fixable in SpawnDev.ILGPU v3.3.0.

Actionable Bugs (Testable & Potentially Fixable)

1. βœ… #1361 β€” MathF.CopySign argument order swapped on GPU β€” FIXED in v3.3.0

Severity High β€” silent wrong results
Affected All GPU backends (CUDA, OpenCL, WebGPU, WebGL)
Reproducible? Yes β€” simple kernel: CopySign(x, -1) should return -x
Root cause XMath.CopySign intrinsic passed (sign, magnitude) instead of (magnitude, sign) to the backend copysign instruction
Fix complexity Low β€” swapped the two arguments in the intrinsic mapping
Testable βœ… CopySignTest passes on all backends

2. βœ… #1309 β€” uint to float cast goes through double β€” FIXED in v3.3.0

Severity Medium β€” crashes on devices without fp64 support
Affected OpenCL devices without double precision (Intel integrated GPUs)
Reproducible? Yes β€” any (float)someUint cast in a kernel
Root cause IL conv.r.un + conv.r4 treated as uint→double→float instead of direct uint→float, emitting fp64 ops on devices that don't support them
Fix complexity Low — added direct uint→float conversion path in the IL-to-IR converter
Testable βœ… UintToFloatCastTest passes on all backends

3. βœ… #1479 β€” Infinite compilation with large local arrays β€” FIXED in v3.3.0

Severity High β€” 10+ minute compile, 10+ GB RAM for new int[1_000_000]
Affected All backends
Reproducible? Yes β€” any kernel with new int[N] where N is large
Root cause LowerArrays.Lower unrolled zero-initialization into N individual store IR nodes regardless of array size
Fix complexity Medium β€” added threshold (32 elements); small arrays keep unrolled stores, large arrays emit a proper IR loop
Testable βœ… All 366 existing tests pass across CUDA/OpenCL/CPU

4. βœ… #1538 β€” Internal Compiler Error with nested struct properties β€” FIXED in v3.3.0

Severity Medium β€” prevents kernel compilation
Affected All backends
Reproducible? Yes β€” deeply nested record struct parameters + static struct member access
Root cause StructureType.Slice used SliceRecursive/DirectFields to extract sub-spans, but type unification could merge types with different field orderings (e.g., {float, Vec3} vs {Vec3, float}), causing wrong slices
Fix complexity Medium β€” changed Slice to use flat Fields directly instead of SliceRecursive
Testable βœ… NestedStructICETest passes on all 8 backends (CPU, CUDA, OpenCL, WebGPU, WebGL, Wasm)

5. βœ… #1540 β€” H100/H200 not working β€” Already fixed in our fork

Severity High β€” H100/H200 GPUs crash immediately
Affected CUDA (Hopper architecture SM_90)
Reproducible? Only on H100/H200 hardware (we don't have this)
Root cause Original ILGPU 1.5.x didn't include SM_90 in architecture tables
Fix complexity Already fixed β€” our fork's CudaArchitecture.Generated.cs includes SM_90, SM_100, SM_101, SM_120
Testable Would need H100/H200 to verify, but code is clearly present

6. βœ… #1539 β€” OpenCL produces wrong results for complex kernels β€” FIXED in v3.3.0

Severity High β€” silent wrong results
Affected All OpenCL backends (not just AMD β€” reproduces on NVIDIA too)
Reproducible? Yes β€” BVH ray traversal kernel with while-loop stack-based traversal
Root cause intermediatePhiVariables dictionary in CLCodeGenerator.GenerateCodeInternal() persisted across blocks, causing stale intermediate phi variables from one block's phi swap to be incorrectly used as source values in a different block's phi bindings
Fix complexity Low (one-line fix) β€” added intermediatePhiVariables.Clear() at the start of each block's phi binding processing
Testable βœ… BVHRayTraversalTest passes on all backends (CPU, CUDA, OpenCL, WebGPU, WASM)

Not Actionable (For Us)

Issue Why Not
#1535 β€” .NET Standard 2.1 We target .NET 10, not relevant
#1359 β€” DebugInformationManager PDB loading exception β€” very environment-specific
#1263 β€” Radix sort too many resources Algorithm-specific, configuration-dependent. Likely Debug vs Release IL verbosity.
#1542, #1476, #1508 Questions, not bugs

Fix Summary

  1. [BUG]: float/MathF.CopySign argument order is swapped when using GPUΒ #1361 (CopySign) βœ… Fixed β€” swapped argument order in PTX/OpenCL intrinsic
  2. [BUG]: Type cast uint to floatΒ #1309 (uintβ†’float) βœ… Fixed β€” direct uintβ†’float conversion without double intermediate
  3. [BUG]: Infinite compilation with local arrays inside kernels!Β #1479 (local array unrolling) βœ… Fixed β€” threshold-based loop for large arrays
  4. Internal Compiler Error when Accessing Struct PropertyΒ #1538 (struct ICE) βœ… Fixed β€” StructureType.Slice uses flat fields instead of SliceRecursive
  5. [BUG]: Doesnt work on Nvidia H200 or H100Β #1540 (H100/H200) βœ… Already fixed β€” SM_90+ architecture tables present in fork
  6. [BUG]: AMD Integrated GPU OpenCL Translation Produces Code That Yields Wrong ResultsΒ #1539 (OpenCL wrong results) βœ… Fixed β€” intermediatePhiVariables.Clear() per-block in OpenCL code generator

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions