Modernize NumPy Random Functions and Fix Mypy Errors for #756 - Shadow PR #771

jonbrenas · 2025-05-12T13:34:45Z

Addresses #756.

Shadows #760.

Thanks @mohamed-laarej. I think there are some troubles with int casting, though.

generator (rng) instead of legacy np.random or Python's random module and unpins the NumPy version in pyproject.toml

…-laarej/malariagen-data-python into GH756-mohamed-laarej-shadow

mohamed-laarej · 2025-05-12T13:57:35Z

Hi @jonbrenas, thanks for creating this PR and incorporating my recent random generation updates!

Regarding your comment about "troubles with int casting," I believe the test failures are due to rng.integers(...) returning NumPy types np.int64 which aren't always compatible where Python int is expected. Explicitly casting these to int() (e.g., int(rng.integers(...))) should resolve these TypeErrors.

Happy to help apply these int() casting fixes to this branch if that would be useful!

jonbrenas · 2025-05-12T15:25:09Z

Thanks @mohamed-laarej. It would be great if you could address these errors but the idea, at least for now, is that you would make the change in your own PR and I would shadow them here (you can ping me when you think your changes are ready and I'll do the shadowing then). It is a very cumbersome process that should be made simpler once you are added to the team.

mohamed-laarej · 2025-05-12T22:42:15Z

Hi @jonbrenas ,

Thanks for the clarification on the workflow and for your feedback on the int casting issues! I also wanted to share my thanks to @ahernank for the recent team invitation I appreciate it.

Understood that for these current changes, the plan is for me to address the int() casting issues (where rng.integers results need to be Python ints to resolve the TypeErrors) in my original PR (#760). I will proceed with those fixes there.

I'll be sure to ping you on PR #760 once it's updated. Please let me know if you'd still prefer to shadow the changes, or if I should proceed more directly now that I have team access.

Thanks!

jonbrenas · 2025-05-13T10:55:01Z

Thanks @mohamed-laarej. As I said in the email I just sent you, @ahernank added you to this repo so you should be able to push to this PR from now on if you so choose.

I think it would be a good idea, for consistency's sake more than correctness of code, if all our random calls used the new numpy generator methods (instead of only integers). For instance, in test_g123.py, the test test_g123_gwss_with_default_sites uses random.choice(...) which has been substituted with rng.choice(...) in other tests (e.g., test_g123_calibration). Could you do a scan an update all the random functions in this PR?

mohamed-laarej · 2025-05-13T13:36:08Z

Hi @jonbrenas,

Thanks for the update and for confirming I can push directly to this PR branch! That will definitely make things easier.

I understand the request for consistency with the random calls. I will do a thorough scan of the tests and update all remaining instances of Python's random module functions (like the random.choice in test_g123.py you pointed out).

I'll also ensure the int() casting issues for rng.integers are fully addressed.

I'll push the changes to this branch once I've completed the updates and tested them locally. I'll let you know when it's ready for another look.

- Replaced all Python random.choice() with rng.choice() for consistency - Replaced random.sample() with rng.choice(..., replace=False) - Added .tolist() to convert NumPy arrays to Python lists where needed - Added str() casting for np.str_ values to ensure Python string compatibility - Fixed 'low >= high' errors in rng.integers() calls by ensuring high > low - Specifically fixed tests/anoph/test_frq.py by changing rng.integers(1, len(cohorts_areas)) to rng.integers(1, len(cohorts_areas)+1) to avoid invalid ranges - Applied int() casting to NumPy integer types where Python int was expected - Fixed site_mask selection to ensure only valid masks are used for each test context Addresses feedback from PR #760 and resolves test failures.

mohamed-laarej · 2025-05-16T16:44:44Z

Hi @jonbrenas,

While fixing the test failures, I ran into a remaining issue in notebooks/karyotype.ipynb:
CI fails with a ValueError: Not enough SNPs when trying to fetch 50,000 SNPs. I suspect this is due to the CI environment using a smaller dataset than local setups.

I'm considering two options to address this:

Add a try-except block to handle the error gracefully

Lower the n_snps value to fit available data

Would you prefer one of these approaches, or suggest a better alternative?

jonbrenas · 2025-05-17T08:40:05Z

Hi @mohamed-laarej, I think lowering n_snps is the way to go. For the notebooks, the actual data is used so the fact that it stopped working is a little strange.

… fix CI notebook execution

review-notebook-app · 2025-05-17T16:29:37Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov · 2025-05-20T17:07:21Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.11%. Comparing base (0197957) to head (3475ee6).
Report is 50 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #771      +/-   ##
==========================================
- Coverage   96.13%   96.11%   -0.02%     
==========================================
  Files          47       47              
  Lines        4683     4687       +4     
==========================================
+ Hits         4502     4505       +3     
- Misses        181      182       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Trying to solve the notebook issue

Solving the linting issue

Rolling back changes

jonbrenas · 2025-05-30T20:57:03Z

Thanks @mohamed-laarej. The issue with the notebooks was an errant ','. Feel free to change the status of this PR if you think it is ready to be reviewed.

mohamed-laarej · 2025-05-30T21:06:40Z

Hi @jonbrenas,
Thanks a lot for spotting and fixing that! I saw that removing the comma solved the issue, but I’m not fully sure why that change made the difference. was the issue caused by the trailing comma creating a tuple, so loc_samples became a one-element tuple instead of an array?
Could you please explain what the comma was doing and how it caused the error? I’d like to understand this better for future reference.

jonbrenas · 2025-06-24T09:11:19Z

Thanks @mohamed-laarej. Sorry, it has been a while, I hadn't seen your message. If I remember correctly, the extra , created a tuple that, when passed to non_zero always returned an empty array and thus no sample was ever found.

jonbrenas · 2025-06-24T09:13:28Z

tests/anoph/test_snp_frq.py

-    sample_sets = random.choice(all_sample_sets)
-    site_mask = random.choice(api.site_mask_ids + (None,))
+    sample_sets = rng.choice(all_sample_sets)
+    if isinstance(fixture, Af1Simulator):


Can you remind me why it needs to be more complex than what it used to be?

Thanks @jonbrenas ,
I believe that part was added due to a mypy issue or possibly a test failure related to site_mask values when using ["gamb_colu_arab", "gamb_colu", "arab"] across all fixture types. I don't recall the exact error message, but I think mypy (or the tests) was unhappy when some simulators were paired with incompatible or undefined site masks. That’s why I added the isinstance check to restrict values to valid options per simulator.

Happy to revisit this and simplify it if needed.

Thanks @mohamed-laarej. At some point, we might have different site_masks so it would be better if didn't have that hard-coded.

…riagen/malariagen-data-python into GH756-mohamed-laarej-shadow

mohamed-laarej · 2025-08-04T17:13:25Z

Hi @jonbrenas ,
Apologies for the confusion on the PR. It seems I accidentally pushed some commits related to PR #755 to this branch (GH756-mohamed-laarej-shadow) instead of the correct one (GH738-mohamed-laarej-fix-plot-haplotype-network-color).

I have now force-pushed to remove those commits from this PR. You should see that the commits ee3f144 and 260a0d3 are no longer present here.

My apologies again for the mix-up. I'm now proceeding with pushing the correct changes to the right branch for PR #755.

Thanks for your understanding!

leehart · 2025-08-08T08:43:20Z

Looks like this PR has unresolved conflicts and test failures.

jonbrenas · 2025-08-08T09:04:08Z

It looks like it needs more clean-up after the erroneous push. It doesn't look too bad. @mohamed-laarej, do you think you can do it?

mohamed-laarej · 2025-08-09T09:54:58Z

Yes @jonbrenas , I can take care of the clean-up.

mohamed-laarej and others added 7 commits April 15, 2025 03:29

Modernize NumPy random functions, fix mypy errors for issue#756

a1cdee3

Merge branch 'malariagen:master' into fix-numpy-random-tests-756-clean

118d25e

Merge branch 'master' into fix-numpy-random-tests-756-clean

8e9d1a5

Some slight updates

717136d

Updates tests to consistently use the seeded NumPy random number

e7ef120

generator (rng) instead of legacy np.random or Python's random module and unpins the NumPy version in pyproject.toml

Merge branch 'fix-numpy-random-tests-756-clean' of github.com:mohamed…

fa9fe3a

…-laarej/malariagen-data-python into GH756-mohamed-laarej-shadow

Merge branch 'master' into GH756-mohamed-laarej-shadow

4d41538

mohamed-laarej added 2 commits May 16, 2025 16:08

Fix test_frq.py to handle single-row dataframes in CI environment

504a73d

Lowering n_snps from 50_000 to 10_000 in notebooks/karyotype.ipynb to…

78e8a31

… fix CI notebook execution

Merge branch 'master' into GH756-mohamed-laarej-shadow

3475ee6

jonbrenas added 5 commits May 30, 2025 15:09

Merge branch 'master' into GH756-mohamed-laarej-shadow

188e0a6

Solving linting issue

194f2c1

Update sample_metadata.py

6a23218

Trying to solve the notebook issue

Update sample_metadata.py

a84edcf

Solving the linting issue

Update karyotype.ipynb

0ddf4ca

Rolling back changes

mohamed-laarej marked this pull request as ready for review June 1, 2025 13:03

jonbrenas commented Jun 24, 2025

View reviewed changes

mohamed-laarej added 2 commits August 4, 2025 16:45

refactor: use private column name '_partition' in plot_haplotype_network

ee3f144

Merge branch 'GH756-mohamed-laarej-shadow' of https://github.com/mala…

260a0d3

…riagen/malariagen-data-python into GH756-mohamed-laarej-shadow

mohamed-laarej force-pushed the GH756-mohamed-laarej-shadow branch from 260a0d3 to 3feb26e Compare August 4, 2025 16:41

Resolve merge conflicts

0464808

Modernize NumPy Random Functions and Fix Mypy Errors for #756 - Shadow PR #771

Are you sure you want to change the base?

Modernize NumPy Random Functions and Fix Mypy Errors for #756 - Shadow PR #771

Uh oh!

Conversation

jonbrenas commented May 12, 2025

Uh oh!

mohamed-laarej commented May 12, 2025

Uh oh!

jonbrenas commented May 12, 2025

Uh oh!

mohamed-laarej commented May 12, 2025

Uh oh!

jonbrenas commented May 13, 2025

Uh oh!

mohamed-laarej commented May 13, 2025

Uh oh!

mohamed-laarej commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonbrenas commented May 17, 2025

Uh oh!

review-notebook-app bot commented May 17, 2025

Uh oh!

codecov bot commented May 20, 2025

Codecov Report

Uh oh!

jonbrenas commented May 30, 2025

Uh oh!

mohamed-laarej commented May 30, 2025

Uh oh!

jonbrenas commented Jun 24, 2025

Uh oh!

jonbrenas Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

mohamed-laarej Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

jonbrenas Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

mohamed-laarej commented Aug 4, 2025

Uh oh!

leehart commented Aug 8, 2025

Uh oh!

jonbrenas commented Aug 8, 2025

Uh oh!

mohamed-laarej commented Aug 9, 2025

Uh oh!

Uh oh!

mohamed-laarej commented May 16, 2025 •

edited

Loading