You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you again for ILGPU. My SpawnDev.ILGPU library is now working quite well on desktop and in the browser with Blazor WASM. Since my library is nearing completion, I spent some time yesterday working on upstream issues in this repo. I would normally submit pull requests, but because past PRs have gone ignored, I don't want to over-invest effort. Instead, Iβll post my findings here so you can use them as you see fit. π
Below is from a SpawnDev.ILGPU internal report based on findings and fixes using unit testing to verify.
Upstream ILGPU Issues Analysis
Analysis of open issues in the original ILGPU repo to determine which bugs are inherited and fixable in SpawnDev.ILGPU v3.3.0.
Actionable Bugs (Testable & Potentially Fixable)
1. β #1361 β MathF.CopySign argument order swapped on GPU β FIXED in v3.3.0
Severity
High β silent wrong results
Affected
All GPU backends (CUDA, OpenCL, WebGPU, WebGL)
Reproducible?
Yes β simple kernel: CopySign(x, -1) should return -x
Root cause
XMath.CopySign intrinsic passed (sign, magnitude) instead of (magnitude, sign) to the backend copysign instruction
Fix complexity
Low β swapped the two arguments in the intrinsic mapping
Testable
β CopySignTest passes on all backends
2. β #1309 β uint to float cast goes through double β FIXED in v3.3.0
Severity
Medium β crashes on devices without fp64 support
Affected
OpenCL devices without double precision (Intel integrated GPUs)
Reproducible?
Yes β any (float)someUint cast in a kernel
Root cause
IL conv.r.un + conv.r4 treated as uintβdoubleβfloat instead of direct uintβfloat, emitting fp64 ops on devices that don't support them
Fix complexity
Low β added direct uintβfloat conversion path in the IL-to-IR converter
Testable
β UintToFloatCastTest passes on all backends
3. β #1479 β Infinite compilation with large local arrays β FIXED in v3.3.0
Severity
High β 10+ minute compile, 10+ GB RAM for new int[1_000_000]
Affected
All backends
Reproducible?
Yes β any kernel with new int[N] where N is large
Root cause
LowerArrays.Lower unrolled zero-initialization into N individual store IR nodes regardless of array size
Fix complexity
Medium β added threshold (32 elements); small arrays keep unrolled stores, large arrays emit a proper IR loop
Testable
β All 366 existing tests pass across CUDA/OpenCL/CPU
4. β #1538 β Internal Compiler Error with nested struct properties β FIXED in v3.3.0
Severity
Medium β prevents kernel compilation
Affected
All backends
Reproducible?
Yes β deeply nested record struct parameters + static struct member access
Root cause
StructureType.Slice used SliceRecursive/DirectFields to extract sub-spans, but type unification could merge types with different field orderings (e.g., {float, Vec3} vs {Vec3, float}), causing wrong slices
Fix complexity
Medium β changed Slice to use flat Fields directly instead of SliceRecursive
Testable
β NestedStructICETest passes on all 8 backends (CPU, CUDA, OpenCL, WebGPU, WebGL, Wasm)
5. β #1540 β H100/H200 not working β Already fixed in our fork
Severity
High β H100/H200 GPUs crash immediately
Affected
CUDA (Hopper architecture SM_90)
Reproducible?
Only on H100/H200 hardware (we don't have this)
Root cause
Original ILGPU 1.5.x didn't include SM_90 in architecture tables
Would need H100/H200 to verify, but code is clearly present
6. β #1539 β OpenCL produces wrong results for complex kernels β FIXED in v3.3.0
Severity
High β silent wrong results
Affected
All OpenCL backends (not just AMD β reproduces on NVIDIA too)
Reproducible?
Yes β BVH ray traversal kernel with while-loop stack-based traversal
Root cause
intermediatePhiVariables dictionary in CLCodeGenerator.GenerateCodeInternal() persisted across blocks, causing stale intermediate phi variables from one block's phi swap to be incorrectly used as source values in a different block's phi bindings
Fix complexity
Low (one-line fix) β added intermediatePhiVariables.Clear() at the start of each block's phi binding processing
Testable
β BVHRayTraversalTest passes on all backends (CPU, CUDA, OpenCL, WebGPU, WASM)