opencl: optimize mxfp4 kernels (#16037) - flatten mxfp4 and packed fp4->fp16 bit-wise convert function (replace lut) - MoE kernel optimizations --------- Co-authored-by: Li He <[email protected]>
opencl: optimize mxfp4 kernels (#16037) - flatten mxfp4 and packed fp4->fp16 bit-wise convert function (replace lut) - MoE kernel optimizations --------- Co-authored-by: Li He <[email protected]>