Note difference in rounding behavior between Sklearn and Onnx models #767
Note difference in rounding behavior between Sklearn and Onnx models #767mike-w-wilson merged 6 commits intomainfrom
Conversation
mike-w-wilson
left a comment
There was a problem hiding this comment.
Thanks for putting this in! I suggest a little more detail in the message which I think clarifies where to expect the difference. Please rework if you dont like it but if you do, LGTM.
gnomad/sample_qc/ancestry.py
Outdated
| raise TypeError("The supplied model is not an sklearn model!") | ||
|
|
||
| logger.warning( | ||
| "sklearn models have different rounding behavior than ONNX models. This may lead to subtly different results around cutoffs." |
There was a problem hiding this comment.
| "sklearn models have different rounding behavior than ONNX models. This may lead to subtly different results around cutoffs." | |
| "sklearn models have different rounding behavior than ONNX models. This may " | |
| "lead to subtly different assignment results for samples around probability " | |
| "cutoffs." |
There was a problem hiding this comment.
ah, thanks! I originally had something wordier, but was worried it was too much.
GitHub's browser is giving me a hard time committing your suggestion as-is, so I had to do it in a separate commit, then I'll merge it
|
1 - split this into two logger.warning() statements, one that .pickle/sklearn is generally just kinda worse practice now, and 2- that changing can lead to those rounding errors article Julia G. sent me abt this - https://medium.com/featurepreneur/pickle-is-sour-lets-use-onnx-90c0805338ac |
|
Sorry, forgot about this for a bit! Rebased it and the commits are a bit messy , but up to date! |
sklearn!
more descriptive
sklearn! # Conflicts: # gnomad/sample_qc/ancestry.py
# Conflicts: # gnomad/sample_qc/ancestry.py
7b792b8 to
f758340
Compare
Note that gnomAD's assign_population_pcs() code DOES round outputs for prob_gen_anc when using an sklearn model but does NOT when using an onnx model.
For a given cutoff of 0.75 (for example) for prob_nfe , some sample with a real probability of 0.7499999999 could be imputed as 'Remaining' when using an onnx model but as 'Non-Finnish European' when using a sklearn model.