Note difference in rounding behavior between Sklearn and Onnx models by matren395 · Pull Request #767 · broadinstitute/gnomad_methods

matren395 · 2025-02-21T16:00:37Z

Note that gnomAD's assign_population_pcs() code DOES round outputs for prob_gen_anc when using an sklearn model but does NOT when using an onnx model.

For a given cutoff of 0.75 (for example) for prob_nfe , some sample with a real probability of 0.7499999999 could be imputed as 'Remaining' when using an onnx model but as 'Non-Finnish European' when using a sklearn model.

mike-w-wilson

Thanks for putting this in! I suggest a little more detail in the message which I think clarifies where to expect the difference. Please rework if you dont like it but if you do, LGTM.

mike-w-wilson · 2025-02-21T16:12:25Z

gnomad/sample_qc/ancestry.py

        raise TypeError("The supplied model is not an sklearn model!")

+    logger.warning(
+        "sklearn models have different rounding behavior than ONNX models. This may lead to subtly different results around cutoffs."


Suggested change

"sklearn models have different rounding behavior than ONNX models. This may lead to subtly different results around cutoffs."

"sklearn models have different rounding behavior than ONNX models. This may "

"lead to subtly different assignment results for samples around probability "

"cutoffs."

ah, thanks! I originally had something wordier, but was worried it was too much.

GitHub's browser is giving me a hard time committing your suggestion as-is, so I had to do it in a separate commit, then I'll merge it

matren395 · 2025-02-21T22:28:26Z

1 - split this into two logger.warning() statements, one that .pickle/sklearn is generally just kinda worse practice now, and 2- that changing can lead to those rounding errors

article Julia G. sent me abt this - https://medium.com/featurepreneur/pickle-is-sour-lets-use-onnx-90c0805338ac

matren395 · 2025-07-15T14:37:20Z

Sorry, forgot about this for a bit! Rebased it and the commits are a bit messy , but up to date!

sklearn!

more descriptive

sklearn! # Conflicts: # gnomad/sample_qc/ancestry.py

# Conflicts: # gnomad/sample_qc/ancestry.py

matren395 added the Sample QC label Feb 21, 2025

matren395 requested a review from mike-w-wilson February 21, 2025 16:00

matren395 self-assigned this Feb 21, 2025

mike-w-wilson approved these changes Feb 21, 2025

View reviewed changes

matren395 requested a review from a team as a code owner July 15, 2025 14:25

matren395 and others added 6 commits September 17, 2025 11:26

ancestry note

cf979aa

sklearn!

Update ancestry.py

953a290

more descriptive

add other logger

1ab78fc

Update to genetic ancestry

44c15ba

ancestry note

19814b7

sklearn! # Conflicts: # gnomad/sample_qc/ancestry.py

add other logger

f758340

# Conflicts: # gnomad/sample_qc/ancestry.py

mike-w-wilson force-pushed the dm/sklearn_warning branch from 7b792b8 to f758340 Compare September 17, 2025 15:30

mike-w-wilson merged commit 897d750 into main Sep 17, 2025
6 checks passed

mike-w-wilson deleted the dm/sklearn_warning branch September 17, 2025 15:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Note difference in rounding behavior between Sklearn and Onnx models #767

Note difference in rounding behavior between Sklearn and Onnx models #767
mike-w-wilson merged 6 commits intomainfrom
dm/sklearn_warning

matren395 commented Feb 21, 2025

Uh oh!

mike-w-wilson left a comment

Uh oh!

mike-w-wilson Feb 21, 2025

Uh oh!

matren395 Feb 21, 2025

Uh oh!

matren395 commented Feb 21, 2025

Uh oh!

matren395 commented Jul 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        "sklearn models have different rounding behavior than ONNX models. This may lead to subtly different results around cutoffs."
+        "sklearn models have different rounding behavior than ONNX models. This may "
+        "lead to subtly different assignment results for samples around probability "
+        "cutoffs."

Conversation

matren395 commented Feb 21, 2025

Uh oh!

mike-w-wilson left a comment

Choose a reason for hiding this comment

Uh oh!

mike-w-wilson Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

matren395 Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

matren395 commented Feb 21, 2025

Uh oh!

matren395 commented Jul 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants