I would like to run routines such as sgemm in a multit-hreaded fashion. So I have done this using cblas, but it seems to be single-threaded. I tried changing it with OPENBLAS_NUM_THREADS, but it still runs single-threaded. Is there any way to set this up?