-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Summary
I have (quite complex) Python code that nests multiple symbolic loops and on some occasions (more details below) Dr.Jit produces invalid PTX that leads to the following error:
COMPILE ERROR: Invalid PTX input: ptx2llvm-module-003: error: Failed to parse input PTX string
ptx2llvm-module-003, line 6238; error : State space mismatch between instruction and address in instruction 'ld'
ptx2llvm-module-003, line 6242; error : State space mismatch between instruction and address in instruction 'ld'
ptx2llvm-module-003, line 6358; error : State space mismatch between instruction and address in instruction 'ld'
ptx2llvm-module-003, line 6361; error : State space mismatch between instruction and address in instruction 'ld'
Cannot parse input PTX stringThe offending line(s) in PTX read like this:
.visible .func (.param .align 4 .b8 result[36]) __direct_callable__c7d274ed0e138d86d000b2ea00bb0bcc(.reg .u64 data) {
// Call: mitsuba::Emitter::sample_ray()
.reg.b8 %b <177>; .reg.b16 %w<177>; .reg.b32 %r<177>;
.reg.b64 %rd<177>; .reg.f16 %h<177>; .reg.f32 %f<177>;
.reg.f64 %d <177>; .reg.pred %p<177>;
[...]
ld.param.b32 %f22, [params+65535]; // <--------- invalid load
mov.b32 %f23, 0xbf800000;
fma.rn.ftz.f32 %f24, %f21, %f22, %f23;
abs.f32 %f25, %f24;
ld.param.b32 %f26, [params+65535]; // <--------- invalid load
[...]
There's a parameter load although it seems no parameters are passed.
Details
The code is intersection logic for a custom Mitsuba shape that is invoked inside the active integrator (I have wrapped the scene and minimally adapted the Python integrators to support the new shape in Python). This works great for prb and other custom integrators, both the primal rendering and forward/backward-mode differentiation. But, there is one custom integrator (basically a Python port of the particle tracer) where the primal rendering works as expected but backward-mode differentiation produces the above PTX code (calling dr.backward_from(dr.sum(img)))
I stepped through the PTX generation and found that a CallInput variable v depends on a variable a that neither has a valid reg_index nor param_offset assigned. Basically, the following happens:
ais assigned0xFFFF(65535) asparam_offsetat eval.cpp#L351.reg_indexofaremains unassigned and therefore it is deemed unused in call.cpp#L480, which leavesparam_offsetat0xFFFF.- The invalid load is generated at cuda_eval.cpp#L301.
I am not 100% sure at which point the state becomes invalid. Unfortunately, I could not create a self-contained reproducer and I cannot (yet) share the code. Interestingly, before the invalid compilation, the same call launches a kernel with a very similar function that is correctly assembled:
.visible .func (.param .align 4 .b8 result[36]) __direct_callable__7e02206b58dbcab2fbe0b1545125e371(.reg .u64 data, .param .align 4 .b8 params[16]) {
// Call: mitsuba::Emitter::sample_ray()
.reg.b8 %b <177>; .reg.b16 %w<177>; .reg.b32 %r<177>;
.reg.b64 %rd<177>; .reg.f16 %h<177>; .reg.f32 %f<177>;
.reg.f64 %d <177>; .reg.pred %p<177>;
[...]
ld.param.b32 %f22, [params+8];
mov.b32 %f23, 0xbf800000;
fma.rn.ftz.f32 %f24, %f21, %f22, %f23;
abs.f32 %f25, %f24;
ld.param.b32 %f26, [params+12];
[...]
I think it would be helpful to get some pointers on where I should keep digging. Thanks!