sycl: addressing non-contiguous src1 mul_mats (nc and batched) #13343
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
#13308 disabled the mul_mats for non-contiguous matrices since we were producing wrong results.
This PR fixes the non-contiguous mulmats based on #13155.
I had to disable the path for
ggml_sycl_mul_mat_vec_ncsince that doesn't deal with non-contiguous src1.27aa259 broken mul_mat performance (Before disabling non-contiguous src1)
5215b91 the current performance
e317791 this PR
We have lost a little bit of performance due to disabling
ggml_sycl_mul_mat_vec_nc, but I think it's better to rethink the whole mul_mat dispatch and refactor matrix multiplications.I'd prefer to address non-critical concerns in a different PR due to the massive performance regression we have (assuming I didn't mess up something along the way).
I am not entirely sure if the performance loss in qwen2 tg128 is noise or caused by this PR