Skip to content

Conversation

jonbrenas
Copy link
Collaborator

Addresses #756.

Shadows #760.

Thanks @mohamed-laarej. I think there are some troubles with int casting, though.

@mohamed-laarej
Copy link
Collaborator

Hi @jonbrenas, thanks for creating this PR and incorporating my recent random generation updates!

Regarding your comment about "troubles with int casting," I believe the test failures are due to rng.integers(...) returning NumPy types np.int64 which aren't always compatible where Python int is expected. Explicitly casting these to int() (e.g., int(rng.integers(...))) should resolve these TypeErrors.

Happy to help apply these int() casting fixes to this branch if that would be useful!

@jonbrenas
Copy link
Collaborator Author

Thanks @mohamed-laarej. It would be great if you could address these errors but the idea, at least for now, is that you would make the change in your own PR and I would shadow them here (you can ping me when you think your changes are ready and I'll do the shadowing then). It is a very cumbersome process that should be made simpler once you are added to the team.

@mohamed-laarej
Copy link
Collaborator

Hi @jonbrenas ,

Thanks for the clarification on the workflow and for your feedback on the int casting issues! I also wanted to share my thanks to @ahernank for the recent team invitation I appreciate it.

Understood that for these current changes, the plan is for me to address the int() casting issues (where rng.integers results need to be Python ints to resolve the TypeErrors) in my original PR (#760). I will proceed with those fixes there.

I'll be sure to ping you on PR #760 once it's updated. Please let me know if you'd still prefer to shadow the changes, or if I should proceed more directly now that I have team access.

Thanks!

@jonbrenas
Copy link
Collaborator Author

Thanks @mohamed-laarej. As I said in the email I just sent you, @ahernank added you to this repo so you should be able to push to this PR from now on if you so choose.

I think it would be a good idea, for consistency's sake more than correctness of code, if all our random calls used the new numpy generator methods (instead of only integers). For instance, in test_g123.py, the test test_g123_gwss_with_default_sites uses random.choice(...) which has been substituted with rng.choice(...) in other tests (e.g., test_g123_calibration). Could you do a scan an update all the random functions in this PR?

@mohamed-laarej
Copy link
Collaborator

Hi @jonbrenas,

Thanks for the update and for confirming I can push directly to this PR branch! That will definitely make things easier.

I understand the request for consistency with the random calls. I will do a thorough scan of the tests and update all remaining instances of Python's random module functions (like the random.choice in test_g123.py you pointed out).

I'll also ensure the int() casting issues for rng.integers are fully addressed.

I'll push the changes to this branch once I've completed the updates and tested them locally. I'll let you know when it's ready for another look.

- Replaced all Python random.choice() with rng.choice() for consistency
- Replaced random.sample() with rng.choice(..., replace=False)
- Added .tolist() to convert NumPy arrays to Python lists where needed
- Added str() casting for np.str_ values to ensure Python string compatibility
- Fixed 'low >= high' errors in rng.integers() calls by ensuring high > low
- Specifically fixed tests/anoph/test_frq.py by changing rng.integers(1, len(cohorts_areas)) to rng.integers(1, len(cohorts_areas)+1) to avoid invalid ranges
- Applied int() casting to NumPy integer types where Python int was expected
- Fixed site_mask selection to ensure only valid masks are used for each test context

Addresses feedback from PR #760 and resolves test failures.
@mohamed-laarej
Copy link
Collaborator

mohamed-laarej commented May 16, 2025

Hi @jonbrenas,

While fixing the test failures, I ran into a remaining issue in notebooks/karyotype.ipynb:
CI fails with a ValueError: Not enough SNPs when trying to fetch 50,000 SNPs. I suspect this is due to the CI environment using a smaller dataset than local setups.

I'm considering two options to address this:

Add a try-except block to handle the error gracefully

Lower the n_snps value to fit available data

Would you prefer one of these approaches, or suggest a better alternative?

@jonbrenas
Copy link
Collaborator Author

Hi @mohamed-laarej, I think lowering n_snps is the way to go. For the notebooks, the actual data is used so the fact that it stopped working is a little strange.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

codecov bot commented May 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.11%. Comparing base (0197957) to head (3475ee6).
Report is 50 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #771      +/-   ##
==========================================
- Coverage   96.13%   96.11%   -0.02%     
==========================================
  Files          47       47              
  Lines        4683     4687       +4     
==========================================
+ Hits         4502     4505       +3     
- Misses        181      182       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jonbrenas
Copy link
Collaborator Author

Thanks @mohamed-laarej. The issue with the notebooks was an errant ','. Feel free to change the status of this PR if you think it is ready to be reviewed.

@mohamed-laarej
Copy link
Collaborator

Hi @jonbrenas,
Thanks a lot for spotting and fixing that! I saw that removing the comma solved the issue, but I’m not fully sure why that change made the difference. was the issue caused by the trailing comma creating a tuple, so loc_samples became a one-element tuple instead of an array?
Could you please explain what the comma was doing and how it caused the error? I’d like to understand this better for future reference.

@mohamed-laarej mohamed-laarej marked this pull request as ready for review June 1, 2025 13:03
@jonbrenas
Copy link
Collaborator Author

Thanks @mohamed-laarej. Sorry, it has been a while, I hadn't seen your message. If I remember correctly, the extra , created a tuple that, when passed to non_zero always returned an empty array and thus no sample was ever found.

sample_sets = random.choice(all_sample_sets)
site_mask = random.choice(api.site_mask_ids + (None,))
sample_sets = rng.choice(all_sample_sets)
if isinstance(fixture, Af1Simulator):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remind me why it needs to be more complex than what it used to be?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jonbrenas ,
I believe that part was added due to a mypy issue or possibly a test failure related to site_mask values when using ["gamb_colu_arab", "gamb_colu", "arab"] across all fixture types. I don't recall the exact error message, but I think mypy (or the tests) was unhappy when some simulators were paired with incompatible or undefined site masks. That’s why I added the isinstance check to restrict values to valid options per simulator.

Happy to revisit this and simplify it if needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mohamed-laarej. At some point, we might have different site_masks so it would be better if didn't have that hard-coded.

@mohamed-laarej mohamed-laarej force-pushed the GH756-mohamed-laarej-shadow branch from 260a0d3 to 3feb26e Compare August 4, 2025 16:41
@mohamed-laarej
Copy link
Collaborator

Hi @jonbrenas ,
Apologies for the confusion on the PR. It seems I accidentally pushed some commits related to PR #755 to this branch (GH756-mohamed-laarej-shadow) instead of the correct one (GH738-mohamed-laarej-fix-plot-haplotype-network-color).

I have now force-pushed to remove those commits from this PR. You should see that the commits ee3f144 and 260a0d3 are no longer present here.

My apologies again for the mix-up. I'm now proceeding with pushing the correct changes to the right branch for PR #755.

Thanks for your understanding!

@leehart
Copy link
Collaborator

leehart commented Aug 8, 2025

Looks like this PR has unresolved conflicts and test failures.

@jonbrenas
Copy link
Collaborator Author

It looks like it needs more clean-up after the erroneous push. It doesn't look too bad. @mohamed-laarej, do you think you can do it?

@mohamed-laarej
Copy link
Collaborator

Yes @jonbrenas , I can take care of the clean-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants