Skip to content

Commit 1e53b84

Browse files
authored
Merge branch 'main' into parse-im-idxml
2 parents f40e952 + 0ba376d commit 1e53b84

28 files changed

+867
-59
lines changed

.github/workflows/publish.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ jobs:
1919
- name: Set up Python
2020
uses: actions/setup-python@v5
2121
with:
22-
python-version: "3.8"
22+
python-version: "3.9"
2323

2424
- name: Install dependencies
2525
run: |
@@ -30,7 +30,7 @@ jobs:
3030
run: python -m build --sdist --wheel .
3131

3232
- name: Install wheel
33-
run: pip install dist/psm_utils-*.whl
33+
run: pip install pyopenms dist/psm_utils-*.whl
3434

3535
- name: Test wheel
3636
run: |

.github/workflows/test.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ jobs:
2727
run: ruff check --output-format=github .
2828

2929
- name: Install package and its dependencies
30-
run: pip install --editable .[dev]
30+
run: pip install --editable .[dev,idxml]
3131

3232
- name: Test with pytest and codecov
3333
run: |
@@ -58,7 +58,7 @@ jobs:
5858
- name: Install package and its dependencies
5959
run: |
6060
python -m pip install --upgrade pip
61-
pip install .[dev]
61+
pip install .[dev,idxml]
6262
6363
- name: Test imports
6464
run: python -c "import psm_utils"

CHANGELOG.md

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,31 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8-
## [1.1.1] - 2024-09-06
8+
## [1.2.0] - 2024-11-19
9+
10+
### Added
11+
12+
-`io.alphadia`: Read support for AlphaDIA `precursors.tsv` (#103 by @rodvrees)
13+
-`io.fragpipe`: Read support for FragPipe `psm.tsv` (#103 by @rodvrees)
14+
-`io.diann`: Read support for DIA-NN TSV (#103 by @rodvrees)
15+
16+
### Changed
17+
18+
- 💥 `psm_list`: When returning a PSM property across the full PSMList (e.g. `psm_list["peptidoform"]`), `np.fromiter` is now used instead of `np.array`. This fixes an issue where if all peptidoforms have the same length, a 3D array of parsed sequences (amino acids and modifications) was be returned instead of an array of `Peptidoform` object. However, this does mean that all resulting arrays will have the `object` dtype instead of the previously coerced dtypes. This might lead to issues downstream. (#102)
19+
-`io.idxml`: Make pyOpenMS an optional dependency, working around https://github.com/OpenMS/OpenMS/issues/7600 for now. For `idxml` support, install psm_utils with the `idxml` extra dependencies. (#107 by @paretje)
920

1021
### Fixed
1122

23+
- 🐛 `io.pepxml`: Fix modification location and mass parsing. Position had an off-by-one error and the reported mass was the sum of the residue and modification instead of the modification alone. (fixes #100, #104)
24+
25+
## [1.1.1] - 2024-10-01
26+
27+
### Fixed
28+
29+
- `io`: Fix Sage filename pattern for automatic file type inference
30+
- `io.flashlfq`: Fix writing PSMs without protein accession
1231
- `io.flashlfq`: Fix column names `Peptide Monoisotopic Mass` and `Protein Accession`.
32+
- `io.idxml`: Fix parsing if spectra file name not present [#92](https://github.com/compomics/psm_utils/issues/92)
1333

1434
## [1.1.0] - 2024-09-05
1535

@@ -18,7 +38,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1838
- `Peptidoform`: Add `modified_sequence` property to return the modified sequence in ProForma format, but without charge state.
1939
- `io`: Add support for reading and writing FlashLFQ generic TSV files.
2040

21-
2241
## [1.0.1] - 2024-08-28
2342

2443
### Fixed

README.rst

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -86,26 +86,29 @@ Goals and non-goals
8686
Supported file formats
8787
**********************
8888

89-
===================================================================================================================== ======================== =============== ===============
90-
File format psm_utils tag Read support Write support
91-
===================================================================================================================== ======================== =============== ===============
89+
===================================================================================================================== ======================== =============== =============== ==========
90+
File format psm_utils tag Read support Write support Comments
91+
===================================================================================================================== ======================== =============== =============== ==========
92+
`AlphaDIA precursors TSV <https://alphadia.readthedocs.io/en/latest/quickstart.html#output-files>`_ ``alphadia`` ✅ ❌
93+
`DIA-NN TSV <https://github.com/vdemichev/DiaNN#output>`_ ``diann`` ✅ ❌
9294
`FlashLFQ generic TSV <https://github.com/smith-chem-wisc/FlashLFQ/wiki/Identification-Input-Formats>`_ ``flashlfq`` ✅ ✅
95+
`FragPipe PSM TSV <https://fragpipe.nesvilab.org/docs/tutorial_fragpipe_outputs.html#psmtsv/>`_ ``fragpipe`` ✅ ❌
9396
`ionbot CSV <https://ionbot.cloud/>`_ ``ionbot`` ✅ ❌
94-
`OpenMS idXML <https://www.openms.de/>`_ ``idxml`` ✅ ✅
97+
`OpenMS idXML <https://www.openms.de/>`_ ``idxml`` ✅ ✅ Requires the optional ``openms`` dependency (``pip install psm-utils[idxml]``)
9598
`MaxQuant msms.txt <https://www.maxquant.org/>`_ ``msms`` ✅ ❌
9699
`MS Amanda CSV <https://ms.imp.ac.at/?goto=msamanda>`_ ``msamanda`` ✅ ❌
97100
`mzIdentML <https://psidev.info/mzidentml>`_ ``mzid`` ✅ ✅
98101
`Parquet <https://psm-utils.readthedocs.io/en/stable/api/psm_utils.io#module-psm_utils.io.parquet>`_ ``parquet`` ✅ ✅
99102
`Peptide Record <https://psm-utils.readthedocs.io/en/stable/api/psm_utils.io/#module-psm_utils.io.peptide_record>`_ ``peprec`` ✅ ✅
100103
`pepXML <http://tools.proteomecenter.org/wiki/index.php?title=Formats:pepXML>`_ ``pepxml`` ✅ ❌
101104
`Percolator tab <https://github.com/percolator/percolator/wiki/Interface>`_ ``percolator`` ✅ ✅
102-
Proteome Discoverer MSF ``proteome_discoverer`` ✅ ❌
105+
`Proteome Discoverer MSF <#>`_ ``proteome_discoverer`` ✅ ❌
103106
`Sage Parquet <https://github.com/lazear/sage/blob/v0.14.7/DOCS.md#interpreting-sage-output>`_ ``sage_parquet`` ✅ ❌
104107
`Sage TSV <https://github.com/lazear/sage/blob/v0.14.7/DOCS.md#interpreting-sage-output>`_ ``sage_tsv`` ✅ ❌
105-
ProteoScape Parquet ``proteoscape`` ✅ ❌
108+
`ProteoScape Parquet <#>`_ ``proteoscape`` ✅ ❌
106109
`TSV <https://psm-utils.readthedocs.io/en/stable/api/psm_utils.io/#module-psm_utils.io.tsv>`_ ``tsv`` ✅ ✅
107110
`X!Tandem XML <https://www.thegpm.org/tandem/>`_ ``xtandem`` ✅ ❌
108-
===================================================================================================================== ======================== =============== ===============
111+
===================================================================================================================== ======================== =============== =============== ==========
109112

110113
Legend: ✅ Supported, ❌ Unsupported
111114

docs/source/api/psm_utils.io.rst

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,38 @@ psm_utils.io
77

88

99

10-
psm_utils.io.flashlfq
10+
psm_utils.io.alphapept
1111
##################
1212

13+
.. automodule:: psm_utils.io.alphapept
14+
:members:
15+
:inherited-members:
16+
17+
18+
psm_utils.io.diann
19+
##################
20+
21+
.. automodule:: psm_utils.io.diann
22+
:members:
23+
:inherited-members:
24+
25+
26+
psm_utils.io.flashlfq
27+
#####################
28+
1329
.. automodule:: psm_utils.io.flashlfq
1430
:members:
1531
:inherited-members:
1632

1733

34+
psm_utils.io.fragpipe
35+
##################
36+
37+
.. automodule:: psm_utils.io.fragpipe
38+
:members:
39+
:inherited-members:
40+
41+
1842
psm_utils.io.idxml
1943
##################
2044

@@ -60,7 +84,7 @@ psm_utils.io.mzid
6084

6185

6286
psm_utils.io.parquet
63-
#################
87+
####################
6488

6589
.. automodule:: psm_utils.io.parquet
6690
:members:
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<msms_pipeline_analysis date="2022-05-17T22:37:25"
3+
xmlns="http://regis-web.systemsbiology.net/pepXML"
4+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
5+
xsi:schemaLocation="http://regis-web.systemsbiology.net/pepXML /tools/bin/TPP/tpp/schema/pepXML_v122.xsd"
6+
summary_xml="/mnt/nfs/DataPool/Projects/Lumos/2022/22_2_Apr_Jun/20220510_VC/Pipe_Mferi_M/G1_iprophet.pep.xml">
7+
<analysis_summary analysis="ptmprophet" time="2022-05-17T23:41:43">
8+
<ptmprophet_summary version="TPP v5.1.0 Syzygy, Build 202012091755-8315 (Linux-x86_64)"
9+
options="M:15.994915,n:42.010565 MZTOL=0.4 G1_iprophet.pep.xml G1_PTMiprophet.pep.xml">
10+
<inputfile name="G1_iprophet.pep.xml" />
11+
<inputfile name="20220511_M1_Mferi_0535_VC_i01_comet.pep.xml"
12+
directory="/mnt/nfs/DataPool/Projects/Lumos/2022/22_2_Apr_Jun/20220510_VC/Data" />
13+
<inputfile name="20220511_M1_Mferi_0535_VC_i02_comet.pep.xml"
14+
directory="/mnt/nfs/DataPool/Projects/Lumos/2022/22_2_Apr_Jun/20220510_VC/Data" />
15+
</ptmprophet_summary>
16+
</analysis_summary>
17+
<msms_run_summary
18+
base_name="/mnt/nfs/DataPool/Projects/Lumos/2022/22_2_Apr_Jun/20220510_VC/Data/20220511_M1_Mferi_0535_VC_i01"
19+
msManufacturer="UNKNOWN" msModel="UNKNOWN" raw_data_type="raw" raw_data=".mzXML">
20+
<sample_enzyme name="trypsin">
21+
<specificity cut="KR" no_cut="P" sense="C" />
22+
</sample_enzyme>
23+
<search_summary
24+
base_name="/mnt/nfs/DataPool/Projects/Lumos/2022/22_2_Apr_Jun/20220510_VC/Data/20220511_M1_Mferi_0535_VC_i01"
25+
search_engine="Comet" search_engine_version="2019.01 rev. 5"
26+
precursor_mass_type="monoisotopic" fragment_mass_type="monoisotopic" search_id="1">
27+
<search_database
28+
local_path="/mnt/nfs/DataPool/FastaDataBases/Mferi_G5847_GCF_000327395.faa_REV.fasta"
29+
type="AA" />
30+
<enzymatic_search_constraint enzyme="Trypsin" max_num_internal_cleavages="3"
31+
min_number_termini="2" />
32+
<aminoacid_modification aminoacid="M" massdiff="15.994900" mass="147.035385"
33+
variable="Y" symbol="*" />
34+
<terminal_modification terminus="N" massdiff="42.010565" mass="43.018390" variable="Y"
35+
protein_terminus="Y" symbol="#" />
36+
<aminoacid_modification aminoacid="C" massdiff="57.021464" mass="160.030649"
37+
variable="N" />
38+
</search_summary>
39+
<spectrum_query spectrum="20220511_M1_Mferi_0535_VC_i01.05387.05387.2" start_scan="5387"
40+
end_scan="5387" precursor_neutral_mass="1006.560276" assumed_charge="2" index="143"
41+
retention_time_sec="1413.1" experiment_label="Mferi_M1">
42+
<search_result>
43+
<search_hit hit_rank="1" peptide="SNLFLMLK" peptide_prev_aa="M" peptide_next_aa="Q"
44+
protein="WP_008364460.1" num_tot_proteins="1" num_matched_ions="11"
45+
tot_num_ions="14" calc_neutral_pep_mass="1006.552139" massdiff="0.008137"
46+
num_tol_term="2" num_missed_cleavages="0" num_matched_peptides="55"
47+
protein_descr="ABC transporter permease [Mycoplasma feriruminatoris]">
48+
<modification_info modified_peptide="n[43]SNLFLMLK" mod_nterm_mass="43.01839"></modification_info>
49+
<search_score name="xcorr" value="1.214" />
50+
<search_score name="deltacn" value="1.000" />
51+
<search_score name="deltacnstar" value="0.000" />
52+
<search_score name="spscore" value="453.5" />
53+
<search_score name="sprank" value="1" />
54+
<search_score name="expect" value="1.17E+00" />
55+
<analysis_result analysis="peptideprophet">
56+
<peptideprophet_result probability="0.5131"
57+
all_ntt_prob="(0.0000,0.0000,0.5131)">
58+
<search_score_summary>
59+
<parameter name="fval" value="-0.3767" />
60+
<parameter name="ntt" value="2" />
61+
<parameter name="nmc" value="0" />
62+
<parameter name="massd" value="8.084" />
63+
<parameter name="isomassd" value="0" />
64+
</search_score_summary>
65+
</peptideprophet_result>
66+
</analysis_result>
67+
<analysis_result analysis="interprophet">
68+
<interprophet_result probability="0.00133535"
69+
all_ntt_prob="(0,0,0.00133535)">
70+
<search_score_summary>
71+
<parameter name="nss" value="0.1289" />
72+
<parameter name="nrs" value="-0.7221" />
73+
<parameter name="nse" value="-0.5741" />
74+
<parameter name="nsi" value="0" />
75+
<parameter name="nsm" value="0.9838" />
76+
<parameter name="nsp" value="20" />
77+
</search_score_summary>
78+
</interprophet_result>
79+
</analysis_result>
80+
<analysis_result analysis="ptmprophet">
81+
<ptmprophet_result prior="1" ptm="PTMProphet_n42.0106"
82+
ptm_peptide="n(1.000)SNLFLMLK">
83+
<mod_terminal_probability terminus="n" probability="1.000" />
84+
</ptmprophet_result>
85+
</analysis_result>
86+
</search_hit>
87+
</search_result>
88+
</spectrum_query>
89+
</msms_run_summary>
90+
</msms_pipeline_analysis>
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
base_width_mobility base_width_rt rt_observed mobility_observed mono_ms1_intensity top_ms1_intensity sum_ms1_intensity weighted_ms1_intensity weighted_mass_deviation weighted_mass_error mz_observed mono_ms1_height top_ms1_height sum_ms1_height weighted_ms1_height isotope_intensity_correlation isotope_height_correlation n_observations intensity_correlation height_correlation intensity_fraction height_fraction intensity_fraction_weighted height_fraction_weighted mean_observation_score sum_b_ion_intensity sum_y_ion_intensity diff_b_y_ion_intensity f_masked fragment_scan_correlation template_scan_correlation fragment_frame_correlation top3_frame_correlation template_frame_correlation top3_b_ion_correlation n_b_ions top3_y_ion_correlation n_y_ions cycle_fwhm mobility_fwhm delta_frame_peak top_3_ms2_mass_error mean_ms2_mass_error n_overlapping mean_overlapping_intensity mean_overlapping_mass_error precursor_idx rank frame_center scan_center score elution_group_idx frame_start scan_stop frame_stop scan_start proteins rt_calibrated flat_frag_start_idx charge mods decoy sequence mz_library channel genes i_0 flat_frag_stop_idx i_2 i_1 i_3 mobility_library rt_library mod_sites delta_rt n_K n_R n_P _decoy proba qval _candidate_idx valid candidate_idx run mod_seq_hash mod_seq_charge_hash pg_master pg pg_qval intensity
2+
0.000000 40.673340 2800.518555 0.000001 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 894.337830 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2.000000 0.968887 0.845673 1.000000 1.000000 1.000000 1.000000 0.000000 0.000000 15.266385 -15.266385 1.000000 0.000000 0.000000 0.929785 0.975279 0.000000 0.000000 0.000000 0.948546 12.000000 14.244627 0.000000 -0.500000 0.132713 -0.218829 0.000000 0.000000 0.000000 10447876 0 72329 0 136.160126 5238821 71876 1 72933 0 P18899 2347.609131 59818105 3 0 SSYGSSSNDDSYGSSNNDDSYGSSNK 894.337830 0 DDR48_YEAST 0.273118 59818117 0.249391 0.348172 0.129319 0.948457 1399.216187 452.909424 1 0 0 0.000000 0.000000 0.000000 10447876 True 10447876 LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_01 8562405370847133435 8562405370847133438 P18899 P18899 0.000000 190103852.035206
3+
0.000000 40.745483 1647.208252 0.000001 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 986.440491 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2.000000 0.991654 0.992141 1.000000 1.000000 1.000000 1.000000 0.000000 0.000000 14.408463 -14.408463 1.000000 0.000000 0.000000 0.738752 0.974915 0.000000 0.000000 0.000000 0.880488 12.000000 9.885651 0.000000 0.000000 -0.391579 -0.698411 0.000000 0.000000 0.000000 8793636 0 42431 0 122.278320 4411698 41978 1 43035 0 Q9ULU4 1670.462402 49907897 2 0 SSQGSSSSTQSAPSETASASK 986.440491 0 PKCB1_HUMAN 0.380560 49907909 0.190793 0.352861 0.075786 1.158085 387.834503 -23.254150 1 0 1 0.000000 0.000000 0.000000 8793636 True 8793636 LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_01 5824087303549386971 5824087303549386973 Q9ULU4 Q9ULU4 0.000000 195496849.073322
4+
0.000000 52.349121 2678.317139 0.000001 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 905.432312 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2.000000 0.986449 0.931379 1.000000 1.000000 1.000000 1.000000 0.000000 0.000000 16.636572 -16.636572 1.000000 0.000000 0.000000 0.978579 0.996334 0.000000 0.000000 0.000000 0.988605 12.000000 13.867673 0.000000 0.000000 -0.432777 0.780247 0.000000 0.000000 0.000000 7132549 0 69158 0 152.012512 3581144 68554 1 69913 0 O60763 2646.791260 39980635 2 0 SSQTSGTNEQSSAIVSAR 905.432312 0 USO1_HUMAN 0.404900 39980647 0.177361 0.352328 0.065410 1.110423 1774.035034 31.525879 0 1 0 0.000000 0.000000 0.000000 7132549 True 7132549 LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_01 14912031975374993231 14912031975374993233 O60763 O60763 0.000000 406414129.849395

0 commit comments

Comments
 (0)