Skip to content

spatial_autocorr limit the numba threads as n_jobs temporarily#984

Closed
selmanozleyen wants to merge 2 commits intoscverse:mainfrom
selmanozleyen:fix/set_numba_threads_on_spatial_autocorr
Closed

spatial_autocorr limit the numba threads as n_jobs temporarily#984
selmanozleyen wants to merge 2 commits intoscverse:mainfrom
selmanozleyen:fix/set_numba_threads_on_spatial_autocorr

Conversation

@selmanozleyen
Copy link
Copy Markdown
Member

Description

I am not sure if this is a bug. But it makes sense for the user to expect numba to be using at most n_jobs on their cores. I made one solution like this one but I think any code that uses numba will have to be modified this way if we see this as a bug right @ilan-gold ? or am I missing something?

Closes

#957 (comment)

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 7, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 66.60%. Comparing base (3771a0a) to head (e459751).
⚠️ Report is 21 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #984      +/-   ##
==========================================
+ Coverage   66.58%   66.60%   +0.02%     
==========================================
  Files          40       40              
  Lines        6057     6061       +4     
  Branches     1014     1014              
==========================================
+ Hits         4033     4037       +4     
  Misses       1663     1663              
  Partials      361      361              
Files with missing lines Coverage Δ
src/squidpy/gr/_ppatterns.py 80.78% <100.00%> (+0.30%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ilan-gold
Copy link
Copy Markdown
Contributor

@selmanozleyen I think this issue is conflating two things. The function you're wrapping doesn't appear to have anything to do with numba, am I right? So why does setting numba help? If it does, could you explain. Would there be any way for you to confirm (if not be test than by posting results) that your fix works?

@selmanozleyen
Copy link
Copy Markdown
Member Author

I assumed it is numba related it uses score_helper which uses moran's I which is implemented with numba in scanpy.

func = _morans_i if mode == SpatialAutocorr.MORAN else _gearys_c

moran helper in scanpy:
https://github.com/scverse/scanpy/blob/15c5434ad0382614a16df612745c183807675d04/src/scanpy/metrics/_morans_i.py#L131

I checked locally with htop and this runs on all the cores without the changes I made

import numpy as np
import pandas as pd

import anndata as ad
import scanpy as sc
import squidpy as sq



# load the pre-processed dataset

adata = sq.datasets.visium_hne_adata()
sq.gr.spatial_neighbors(adata)
sq.gr.spatial_autocorr(adata, n_jobs=1, n_perms=10000000, mode="moran")

@ilan-gold
Copy link
Copy Markdown
Contributor

Awesome thanks! And with the change, it works? I would wonder if this problem applies everywhere this parallelize appears, in which case it might make sense to make this a decorator on parallelize or the like.

@selmanozleyen
Copy link
Copy Markdown
Member Author

Yes it works when I set it to 1 but it doesn't work for 2 because there is no guarantee that numba and joblib will use the same cores. So there would be 2*n_jobs cores utilized. I couldn't observe this very clearly because I have 8 cores locally already atm.

But do you think this is a bug? I think n_jobs was just meant for the parallelize function. And setting a global variable like this doesn't feel right. What happens if program runs this method and when it runs another program expects more cores from numba? I think it just a matter of communicating what n_jobs means otherwise the user should set the global configuration of numba imo.

@ilan-gold
Copy link
Copy Markdown
Contributor

ilan-gold commented Apr 7, 2025

Right @selmanozleyen yes I got lost in the sauce. I understand now better, I think. So:

  1. The n_jobs parameter is meant for parallelize, not numba
  2. Separately numba has its own setting the environment variable NUMBA_NUM_THREADS
  3. Setting the former does not interact with the later, so limiting n_jobs means numba may still max out your CPU (or similar behavior)

If so, then I think this issue is one of documentation, you're right.

@timtreis
Copy link
Copy Markdown
Member

timtreis commented Apr 30, 2025

Based on the original comment it seems that numba just grabs cores irregardless though. I think we should limit numba here to only act "within core" since as Selman said the user expects only n_jobs to be busy. Especially in pipeline use this could otherwise cause nasty issues

@timtreis timtreis linked an issue Apr 30, 2025 that may be closed by this pull request
@selmanozleyen
Copy link
Copy Markdown
Member Author

I will close this as we decided this is a documentation issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants