Skip to content

Variability in outputs huggingface vs local #92

@sbeausol

Description

@sbeausol

Hello-
Thanks for your efforts, it's an interesting paper and an exciting advancement! I have run some test data on the huggingface and locally and I'm seeing some discrepancies in the outputs, particularly after the instanovo+ outputs. I am running under the default parameters. I am finding that the diffusion_predictions are often empty and aren't consistent with the transformer_predictions in the final output.
Huggingface output

scan_number precursor_mz precursor_charge retention_time spectrum_id experiment_name transformer_prediction transformer_log_probability refined_prediction refined_log_probability refined_delta_mass_ppm
0 468.871978759766 3 212.1862777 test:0 test GHNSYTC[UNIMOD:4]EATHK -0.4897 GHNSYTC[UNIMOD:4]EATHK -0.0007 335465.81
1 778.737365722656 3 212.454416 test:1 test GGEEEEEEEEEEEEEEEEK -252.0484 GGEEEEEEEEEEEEEEEEK -0.1081 331829.78

Here is local results file before refinement:

scan_number precursor_mz precursor_charge experiment_name spectrum_id predictions predictions_tokenised log_probabilities token_log_probabilities delta_mass_ppm
0 468.871978759766 3 test test:0 GHNSYTC[UNIMOD:4]EATHK G, H, N, S, Y, T, C[UNIMOD:4], E, A, T, H, K -0.49145248532295227 [-0.07410459220409393, -0.013641584664583206, -0.2776679992675781, -0.002582312561571598, -0.11276249587535858, -0.0019204046111553907, -3.8980677345534787e-05, -0.0028956886380910873, -6.09140915912576e-05, -0.00011085849109804258, -0.00025555206229910254, -0.00501991854980588] 3.7517556583161897
1 778.737365722656 3 test test:1 GGEEEEEEEEEEEEEEEEK G, G, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, K -252.05099487304688 [-3.0641820430755615, -2.57352352142334, -1.11761474609375, -1.0334386825561523, -0.8424715995788574, -0.6960052847862244, -0.6549468636512756, -0.7385885119438171, -0.818410336971283, -0.8820850253105164, -0.8136520385742188, -0.6193901896476746, -0.833812415599823, -1.0504907369613647, -1.124313235282898, -1.5463999509811401, -0.7983300685882568, -0.8665404319763184, -0.2932451367378235] 3149.0891502897002

Here is the final table with the final results:

scan_number precursor_mz precursor_charge experiment_name spectrum_id delta_mass_ppm diffusion_predictions_tokenised diffusion_predictions diffusion_log_probabilities transformer_predictions transformer_predictions_tokenised transformer_log_probabilities transformer_token_log_probabilities
0 468.871978759766 3 test test:0 3.7517556583161897 [] -0.003761152969673276 GHNSYTC[UNIMOD:4]EATHK G, H, N, S, Y, T, C[UNIMOD:4], E, A, T, H, K -0.49145248532295227 [-0.07410459220409393, -0.013641584664583206, -0.2776679992675781, -0.002582312561571598, -0.11276249587535858, -0.0019204046111553907, -3.8980677345534787e-05, -0.0028956886380910873, -6.09140915912576e-05, -0.00011085849109804258, -0.00025555206229910254, -0.00501991854980588]
1 778.737365722656 3 test test:1 3149.0891502897002 ['F', 'S'] FS -0.03220748156309128 GGEEEEEEEEEEEEEEEEK G, G, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, E, K -252.05099487304688 [-3.0641820430755615, -2.57352352142334, -1.11761474609375, -1.0334386825561523, -0.8424715995788574, -0.6960052847862244, -0.6549468636512756, -0.7385885119438171, -0.818410336971283, -0.8820850253105164, -0.8136520385742188, -0.6193901896476746, -0.833812415599823, -1.0504907369613647, -1.124313235282898, -1.5463999509811401, -0.7983300685882568, -0.8665404319763184, -0.2932451367378235]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions