Skip to content

Making sense of internal fragment ions #9

@veitveit

Description

@veitveit

Title

Making sense of internal fragment ions

Abstract

Peptide identification from fragment mass spectra uses only part of the contained information. Here, internal fragment ions, i.e. peptides with both termini cleaved, have a high potential to provide further evidence about peptide identity. Despite the option to include internal ions in several database search engines, their actual use has so far been explored only poorly. A big challenge lies in the large number of possible ions, and thus the difficulty in distinguishing them from background signals or other fragment ions. This hackathon project aims to shed more light into the applicability of internal ions by creating a framework to determine their characteristic patterns in MS data. We will provide statistics and extensive visualizations for internal ions in a given data set. For that we will employ both raw data files and identifications from a database search. This framework will establish the grounds for the detection and utilization of characteristic internal ions in a dataset, explore potential “fragment motifs”, and facilitate the distinction of actual internal ions from background noise. A clearer understanding and exploration of internal fragmentation will channel future efforts towards a more extensive use of them in MS data processing leading to higher peptide identification rates.

Project Plan

We suggest the following tasks for creating and testing the framework:

  • Nomenclature and definitions: Given the complexity and the large combinatorics of internal fragment ions, we will discuss and stringently define the nomenclature. Existing knowledge and nomenclatures will be assessed for their usability.

  • Data sets: Selection of about ten data sets from different MS technologies that will be used for testing and exploration. This will include bottom-up and top-down approaches, as well as different fragmentation types and acquisition methods.

  • Implementation: We plan to take advantage of the pyteomics tools for reading files and spectra as well as of libraries such as spectral utils to extract fragment ions. Visualizations and further analysis will be in python and/or R depending on the participants’ background.

  • Assessment and testing: Different statistical measures and motif algorithms will be tested and discussed. Interactive visualizations will be used to conveniently explore different subsets of one or multiple MS runs.

  • Software: We expect to achieve developing a prototype that can process any data set given by standard file formats mzML and mzid, and optional widely used formats like mgf and pepxml.

These tasks will be discussed on the first day prior to their implementation. Depending on the skills and interest of the participants, we may define working groups for addressing them in the following days.

Technical details

  • The programming language(s): Python and/or R. For faster implementations, we might collaborate with the hackathon focussing on Rust implementations.

  • Existing software that will be featured: python libraries: pyteomics, spectrum-utils

  • (Public) datasets that will be used and their availability
    Given the available ground truth and the availability of different fragmentation types, we might use the proteometools (http://www.proteometools.org/)

Contact information

Arthur Grimaud
Protein Research Group
Department for Biochemistry and Molecular Biology
University of Southern Denmark
Campusvej 55
5230 Odense M / Denmark
agrimaud@bmb.sdu.dk

Veit Schwämmle
Protein Research Group
Department for Biochemistry and Molecular Biology
University of Southern Denmark
Campusvej 55
5230 Odense M / Denmark
veits@bmb.sdu.dk

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions