Hi
I want to implement BERT on timeloop with a Systolic Array architecture. I have found Matrix Multiplication Layer examples, but haven't found elementwise layers (for layernorm, softmax, embedding,...).
Where can I find templates or a guide for this issue?
Thanks