https://github.com/intel/intel-xpu-backend-for-triton/pull/2951 implemented `UpcastMXFPOp` in blocked layout. We want to support it with the dot operand layout to improve performance.