-
I want to implement a 64x64 matrix-vector product in the megakernel of drjit. Here is how I do it: class Linear:
def __init__(self, input_dims, output_dims):
self.weight = dr.ones(mi.TensorXf, shape=(output_dims, input_dims))
def __call__(self, x: List[mi.Float]) -> List[mi.Float]:
res = []
for i in range(self.weight.shape[0]):
row_vec = dr.ravel(self.weight[i])
v = mi.Float(0.)
for j in range(len(row_vec)):
v = dr.fma(row_vec[j], x[j], v)
res.append(v)
return res However, seems that each time the My question is, how to implement the matrix-vector product that achieves the fastest performance, while producing a kernel code that can be reused across different |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Hi @gerwang I'm confused by what you're currently trying to achieve. Could you share an example snippet using this One issue I see is that The tensor support in Dr.Jit is limited. I typicall recommend directly accessing its underlying (flat) buffer with |
Beta Was this translation helpful? Give feedback.
-
I want to perform inference on a small (tiny-cuda-nn scale) MLP network within the megakernel of drjit. The specific invocation of
From an external perspective, the entire rendering operation is initiated by calling Some considerations for my case:
Could you please comment on whether this goal is feasible in mitsuba3 and some tips to improve my current implementation? Thank you! |
Beta Was this translation helpful? Give feedback.
Aha, I see!
This is definitely feasible.
I've modified the snippet you sent: