Using higher dim embeddings for the rotation matrix seems to perform considerably better than quaternion or euler angle representations due to NNs learning continuous vector spaces and rotation being continuous only in 5+ dimensional Euclidean spaces.
Empirically we can see discontinuities in our quaternion states which is undoubtedly making it very challenging for the model to learn.
https://zhouyisjtu.github.io/project_rotation/rotation.htm