-
Notifications
You must be signed in to change notification settings - Fork 2
PREFIX.pergeno.aa_mutations.csv
Xiaolong Cao edited this page Feb 24, 2021
·
5 revisions
This file includes amino acid changing annotations for each proteins.
files are in csv format (with \t as separator). Transposed table looks like:
| protein_id | ENSP00000437362 | ENSP00000446015 | ENSP00000450505 | ENSP00000451203 | ENSP00000452431 |
|---|---|---|---|---|---|
| protein_id_fasta | ENSP00000437362.1 | ENSP00000446015.1 | ENSP00000450505.1 | ENSP00000451203.1 | ENSP00000452431.1 |
| seqname | 14 | 14 | 14 | 14 | 14 |
| strand | + | + | + | + | + |
| frameChange | |||||
| stopGain | |||||
| AA_stopGain | |||||
| stopLoss | |||||
| stopLoss_pos | |||||
| n_variant_AA | 1.0 | 2.0 | 2.0 | 1.0 | 1.0 |
| n_deletion_AA | |||||
| n_insertion_AA | |||||
| variant_AA | F70S(14-21888371-T-C) | P50Q(14-21924450-C-A);Q76E(14-21924527-C-G) | E77K(14-21979008-G-A);S78G(14-21979011-A-G) | S103L(14-22086905-C-T) | T25P(14-22124030-A-C) |
| insertion_AA | |||||
| deletion_AA | |||||
| len_ref_AA | 113 | 116 | 113 | 121 | 109 |
| len_alt_AA | 113.0 | 116.0 | 113.0 | 121.0 | 109.0 |
Columns:
- protein_id: protein_id used in perGeno analysis
- protein_id_fasta: protein id that is stored in fasta file
- seqname: chromosome name
- strand:
+or-. Strand of proteins in chromosome. - frameChange:
TrueorFalse. If there is a frame change variation.- fame change is defined by the last reading frame. So it means, for example, if there is a single nucleotide insertion and a single nucleotide deletion, the reading frame will be considered as unchanged as the last reading frame is unchanged.
- stopGain:
TrueorFalse. If there is a stopgain variation. - AA_stopGain: A string describes amino acid (AA) that is changed to a stop codon. It looks like
E90*(chr18-63712604-G-T), which means that the 90th amino acid E in the reference protein sequence is mutated to a stop codon, and the variation ischr18-63712604-G-T. If it looks like-103*(chr8-18067100-T-TGCACCTGTGCTGTATATCTAAGACATACA), it means that variations change the protein sequence in a complex way, so the reference protein sequence is shorter or we cannot assign an AA at this site. Here 103 is just the codon number counting from the start codon in the transcript sequences. In the example here, this insertion introduced a stop codon. - stopLoss:
TrueorFalse. If there is a stoploss mutation. - stopLoss_pos: A string describes position of stoploss in protein sequence.
179(chr17-7254884-T-G), the AA before the stop is the 179 in reference protein sequence, and variantchr17-7254884-T-Gcaused this stoploss.187(), a stop-loss caused by variants other than substitution. - nonStandardStopCodon: Value will be
1if translation is stop at a position that is not a stop codon. - n_variant_AA: count of AA substitution.
- n_deletion_AA: count of AA deletion.
- n_insertion_AA: count of AA insertion.
- variant_AA: A string describes the substituted AAs. For example,
G44E(chr22-25763322-G-A);W547C(chr22-25770933-G-C);W661R(chr22-25777694-T-C);H1119Q(chr22-25843883-C-A).G44Emeans the 44th G is changed to E, and this is caused by variantchr22-25763322-G-A. - insertion_AA: A string describes the inserted AAs. For example:
-287P(chr1-47438996-T-TCCGCAC);-287H(chr1-47438996-T-TCCGCAC), which means two AAs,PandHwere inserted after 287th AA, caused by variantchr1-47438996-T-TCCGCAC. - deletion_AA: A string describes the deleted AAs. For example:
G1122-(chr21-45504511-CGGCCCCCCA-C);P1123-(chr21-45504511-CGGCCCCCCA-C);P1124-(chr21-45504511-CGGCCCCCCA-C), means three AAs,GPPwere deleted due to variantchr21-45504511-CGGCCCCCCA-C - len_ref_AA: length of provided protein
- len_alt_AA: length of changed protein
Note:
- Some of cells may be empty, which usually means False or 0.
- Currently, frame-shift variations were annotated as a serious of deletion_AAs and intertion_AAs.
- For
insertion_AAanddeletion_AAannotation, if the change is caused by INDELs, the annotation string might be like "-443A()", where "-" means no AA.
PrecisonProDB