Skip to content

Conversation

timothymillar
Copy link
Collaborator

@timothymillar timothymillar commented Jan 7, 2025

I've tested this with a real pedigree of ~55,000 individuals on a 4-core laptop in WSL2. I can calculate the pedigree matrix using chunks of 5000 samples and save the chunked matrix to a Zarr store in a total of < 25s using ~2.5GB of RAM. The full matrix would be ~22.5GB which exceeds the memory of this machine.

I've used the @jitclass experimental feature from Numba for a simple triangular matrix class. Using a triangular matrix halves the RAM needed for the intermediate matrices. It's not strictly necessary to use @jitclass for this but it allows for greater code reuse via custom __setitem__/__getitem__. If this is an issue it could be reworked to avoid @jitclass.

I've also removed the test runs with NUMBA_DISABLE_JIT: 1 because this introduces a dependency on @guvectorize and @jitclass in pedigree.py.

Copy link
Collaborator

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@timothymillar
Copy link
Collaborator Author

Thanks @jeromekelleher, I assume we're not worried about the Cubed and Zarr 3 test runs failing for now?

@jeromekelleher
Copy link
Collaborator

I didn't see those @tomwhite, thoughts here?

@tomwhite
Copy link
Collaborator

I didn't see those @tomwhite, thoughts here?

They are not related to this PR, so OK to merge this if it's ready. I'll be looking at the Zarr 3 changes today.

@jeromekelleher
Copy link
Collaborator

Happy to merge when you are @timothymillar

@timothymillar timothymillar added the auto-merge Auto merge label for mergify test flight label Jan 13, 2025
@mergify mergify bot merged commit 5b96476 into sgkit-dev:main Jan 13, 2025
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Auto merge label for mergify test flight
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Chunked pedigree kinship MemoryError: Allocation failed (probably too large) with ped.compute()
3 participants