Replies: 1 comment
-
|
Hi Wouter, We already have quite a lot of info in .protein_description.tsv. However, indeed not the whole headers :) These are not stored by DIA-NN internally and are hence not exported. One reason is reducing RAM usage in case a huge FASTA with individual peptides is provided to DIA-NN, in case it would be extra 32 bytes/header. Not critical, but may be noticeable with ~100 million peptides. So I would suggest to best just load the .protein_description.tsv into R and match the protein sequence ids there to the headers using some FASTA-reading R package. Or if you can control the contents of the FASTA header, .protein_description.tsv has the descriptions it extracts (assuming UniProt-like format) - can just put extra info into those. Best, |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Vadim,
I've greatly enjoyed DIA-NN, especially with the rapidly incrementing improvements as of late. Thank you for making it available to the community.
I've also found that it would be very useful to my workflow if DIA-NN's main output could also include a column containing full FASTA headers, both for the protein group accessions (Protein.Group) as well as -- or rather, especially -- the peptide level accessions (Protein.Ids). Would you consider adding such a feature? I recognize that, especially for the Protein.Ids, that this would be very long strings of text pasted;together;like;this, but being able to extract information that I encode in my custom FASTAs using e.g. REGEX without first parsing and linking the original fasta to the table myself would be very convenient.
Kind regards,
Wouter
Beta Was this translation helpful? Give feedback.
All reactions