Conversation
|
Sample "flat" version template <int P_1D, int Q_1D>
inline __device__ void WeightTensor2dFlattened(SharedData_Cuda &data, const CeedScalar *__restrict__ q_weight_1d, CeedScalar *w) {
const int max = P_1D < Q_1D ? P_1D : Q_1D;
WeightTensor2d_Core<Q_1D>(data, data.t_id_x % max, data.t_id_x / max, q_weight_1d, w);
}
|
|
Ok, I need to check on the memory in the Also.... I'm not sure how 3D will be tackled |
|
Ok, there's going to need to be more disentangling, as the operator I'm trying to target has different dim for different nodal spaces |
9e04d4c to
af7851a
Compare
|
Ok, separate dims for each field now, but theres some bug that's giving wrong results |
474fba4 to
d334e61
Compare
|
ugh, T_1D is wrong for this strategy. Pondering Idea - move slice or stand up fully separate versions? Probably the fully separate is the way to go at this point |
d334e61 to
336ccc0
Compare
|
Getting closer. 2D Tensor + 3D NonTensor is working now. Need 3D Tensor + 3D NonTensor next. |
|
I'm starting to think this was not worth any of the time spent here - I think the restrictions also need to be hacked around with. I'm going to abandon this effort for now and switch to AtPoints assembly |
292b18f to
8e1cf66
Compare
|
Holy crap, we've got non-tensor with 2D tensor working perfectly now. Just need non-tensor with 3D tensor. |
4d404ab to
9a751b7
Compare
abac4c0 to
f33ffa5
Compare
|
Ok, ready for review. Plan is squash + merge |
f33ffa5 to
b8245c6
Compare
Last step in the *gen refactor. This will allow us to run operators that have a mix of tensor product and non-tensor bases. 2D is easier, 3D will take more thought.