Skip to content

ratschlab/immunopepper_analysis

Repository files navigation

This repository contains the code for the paper:

"ImmunoPepper: Extracting personalized peptides from complex splicing graphs"

Laurie Prélot,1,2 Jiayu Chen 1, Matthias Hüser 1,3, André Kahles,1,2∗ and Gunnar Rätsch 1,2,3,4,5∗

1 Department of Computer Science, ETH Zürich, Zürich, Switzerland

2 University Hospital Zürich, Biomedical Informatics Research, Zurich, Switzerland

3 SIB Swiss Institute of Bioinformatics, Zürich, Switzerland

4 Department of Biology, ETH Zürich, Zürich, Switzerland

5 ETH AI Center, Zürich, Switzerland

∗ Corresponding authors

They are listed in sequential order

Cancer peptides translation

projects2020_immunopepper_analysis/immunopepper/translate_TCGA-BRCA/send_all_cross_sample_TCGA_frames.sh

Examples in:

projects2020_immunopepper_analysis/immunopepper/translate_TCGA-BRCA/run_file_germline.sh projects2020_immunopepper_analysis/immunopepper/translate_TCGA-BRCA/run_file_ref.sh projects2020_immunopepper_analysis/immunopepper/translate_TCGA-BRCA/run_file_somatic.sh projects2020_immunopepper_analysis/immunopepper/translate_TCGA-BRCA/run_file_somatic_and_germline.sh projects2020_immunopepper_analysis/immunopepper/translate_TCGA-BRCA/run_file_tgx_germline.sh projects2020_immunopepper_analysis/immunopepper/translate_TCGA-BRCA/run_file_tgx_ref.sh projects2020_immunopepper_analysis/immunopepper/translate_TCGA-BRCA/run_file_tgx_somatic.sh projects2020_immunopepper_analysis/immunopepper/translate_TCGA-BRCA/run_file_tgx_somatic_and_germline.sh projects2020_immunopepper_analysis/immunopepper/translate_TCGA-BRCA/tmp_run/*

GTEX peptides translation

projects2020_immunopepper_analysis/immunopepper/translate_GTEX/GTEX2017/send_all_no_count_GTEX2017.sh

Example in:

projects2020_immunopepper_analysis/immunopepper/translate_GTEX/GTEX2017/run_all_no_count_GTEX2017.sh

Annotation removed (for requant pipeline)/ Annotation and GTEX translated in:all reading frames (for allframes pipeline) removed with:

projects2020_immunopepper_analysis/immunopepper/filter_cancerspecific/step_1_remove_GTEX/send_all_samples_BRCA-GTEX2017_dev.sh

Example in:

projects2020_immunopepper_analysis/immunopepper/filter_cancerspecific/step_1_remove_GTEX/run_all_samples_BRCA-GTEX2017.sh projects2020_immunopepper_analysis/immunopepper/filter_cancerspecific/step_1_remove_GTEX/tmp_launch

GTEX removed by looking at cancer junction expression requantified in GTEX (requant pipeline)

projects2020_immunopepper_analysis/immunopepper/filter_cancerspecific/step_2_combine_remove_GTEX/filter-requant.py projects2020_immunopepper_analysis/immunopepper/filter_cancerspecific/step_2_combine_remove_GTEX/run_filter-requant.sh

Developer version in:

projects2020_immunopepper_analysis/immunopepper/filter_cancerspecific/step_2_combine_remove_GTEX/20230824_dev_combine_GTEX.ipynb

Helpers in:

projects2020_immunopepper_analysis/immunopepper/filter_cancerspecific/helpers/helpers_analyze_results.py projects2020_immunopepper_analysis/immunopepper/filter_cancerspecific/helpers/helpers_plotting.py

GTEX junctions present in:the STAR/BAM files removed (allframes pipeline)

Cancer cohort recurrence filter applied (requant pipeline, allframes pipeline) ### projects2020_immunopepper_analysis/immunopepper/filter_cancerspecific/step_3_recurrence_cancer_remove_junctions_BAM/20230824_dev-recurrence-star-save.ipynb (Notebook!)

Helpers in:

projects2020_immunopepper_analysis/immunopepper/filter_cancerspecific/helpers/helpers_analyze_results.py projects2020_immunopepper_analysis/immunopepper/filter_cancerspecific/helpers/helpers_plotting.py

Plotting the 2 pipelines after the filtering + comparison with cancer cell paper

projects2020_immunopepper_analysis/immunopepper/filter_cancerspecific/step_4_plotting_intermediate_step/20230824_plot_pipelines_reproducibility.ipynb

Helper in:

projects2020_immunopepper_analysis/immunopepper/filter_cancerspecific/helpers/helpers_analyze_results.py projects2020_immunopepper_analysis/immunopepper/filter_cancerspecific/helpers/helpers_plotting.py These plots can be found in:the Main:and the Supplementary sections of the paper.

MHC Binding

projects2020_immunopepper_analysis/mhcBinding/send_process_mhc_filter.sh

Examples in:

projects2020_immunopepper_analysis/mhcBinding/launch_files #Cleaning in: projects2020_immunopepper_analysis/mhcBinding/clean_intermediate.sh

Fasta files generation

projects2020_immunopepper_analysis/pepFasta/20231106_meta-matching_format-peptides-modular.py projects2020_immunopepper_analysis/pepFasta/helpers_format_peptides.py projects2020_immunopepper_analysis/pepFasta/helpers_metadata_matching.py projects2020_immunopepper_analysis/pepFasta/run_meta_matching.sh projects2020_immunopepper_analysis/pepFasta/send_meta-matching.sh

Trypsine digestion

projects2020_immunopepper_analysis/pepDigest/send_trypsine.sh

PepQuery search

projects2020_immunopepper_analysis/pepQuery/pep_search/multi_pepQ.sh

Example in:

projects2020_immunopepper_analysis/pepQuery/pep_search/run_tmp/*

Subset Neighbor search

projects2020_immunopepper_analysis/pepNeighborsSearch

First the trypsine digested peptides are indexed and the neigbors are

searched projects2020_immunopepper_analysis/pepNeighborsSearch/20240108-tide-index/. projects2020_immunopepper_analysis/pepNeighborsSearch/20240108-tide-index/./runall-createIndex-ipp.sh projects2020_immunopepper_analysis/pepNeighborsSearch/20240108-tide-index/./script_index.sh projects2020_immunopepper_analysis/pepNeighborsSearch/20240108-tide-index/./launch_multi_createIndex.sh

Then the database search is performed with Crux

projects2020_immunopepper_analysis/pepNeighborsSearch/20240108-tide-search/. projects2020_immunopepper_analysis/pepNeighborsSearch/20240108-tide-search/./runall-search-ipp.sh projects2020_immunopepper_analysis/pepNeighborsSearch/20240108-tide-search/./launch_multi_search.sh projects2020_immunopepper_analysis/pepNeighborsSearch/20240108-tide-search/./script_search.sh

Finally, the FDR estimation is performed

a) The search results are pooled across fractions projects2020_immunopepper_analysis/pepNeighborsSearch/20240109-FDR-correct/a_extract_concat/. projects2020_immunopepper_analysis/pepNeighborsSearch/20240109-FDR-correct/a_extract_concat/./runall-extract-ipp.sh projects2020_immunopepper_analysis/pepNeighborsSearch/20240109-FDR-correct/a_extract_concat/./script_extract.sh projects2020_immunopepper_analysis/pepNeighborsSearch/20240109-FDR-correct/a_extract_concat/./launch_multi_extract.sh

b) The FDR is performed either PSM-wise with Crux search engine, or peptide-wise with the crema software tool. projects2020_immunopepper_analysis/pepNeighborsSearch/20240109-FDR-correct/b_conf_FDR/. projects2020_immunopepper_analysis/pepNeighborsSearch/20240109-FDR-correct/b_conf_FDR/./script_confidence_crema.sh projects2020_immunopepper_analysis/pepNeighborsSearch/20240109-FDR-correct/b_conf_FDR/./launch_multi_FDR.sh projects2020_immunopepper_analysis/pepNeighborsSearch/20240109-FDR-correct/b_conf_FDR/./runall-FDR-ipp.sh projects2020_immunopepper_analysis/pepNeighborsSearch/20240109-FDR-correct/b_conf_FDR/./script_confidence.sh projects2020_immunopepper_analysis/pepNeighborsSearch/20240109-FDR-correct/b_conf_FDR/./script_crema.py

Plotting: Parse results and save it to disk in:a dataframe

projects2020_immunopepper_analysis/plotting/plot_proteomics_results/20240109_parse_proteomics_results-peptides-kmers-rates-1.ipynb

Plotting: Compare the results with cancer cell paper and compare the

results between the two proteomics methods### projects2020_immunopepper_analysis/plotting/plot_proteomics_results/20240109_from_parsed_compare_with_CC.ipynb These plots can be found in:the Main:and the Supplementary sections of the paper.

Plotting: Plot the number of peptides, kmers, validation rates and recurrences

projects2020_immunopepper_analysis/plotting/plot_proteomics_results/20240109_from_parsed_plot_raw_numbers.ipynb These plots can be found in:the Main:and the Supplementary sections of the paper.

Plotting: Helpers

projects2020_immunopepper_analysis/plotting/plot_proteomics_results/helpers_initialize.py projects2020_immunopepper_analysis/plotting/plot_proteomics_results/helpers_parse_results.py projects2020_immunopepper_analysis/plotting/plot_proteomics_results/helpers_plotting_bars.py projects2020_immunopepper_analysis/plotting/plot_proteomics_results/helpers_validated_kmers.py

Review Task 1: Separate the k-mers according to their mutation class

Found in projects2020_immunopepper_analysis/separatePeptideOrigin/

First step aims at pooling all the generated k-mers by categories (database creation to simplify post-processing) projects2020_immunopepper_analysis/separatePeptideOrigin/step1_generated_kmers_extract/send_generated_kmers_extract.sh

Second step creates a file which maps each kmer to its class: junction_only (reference), germline, germline_and_somatic, somatic projects2020_immunopepper_analysis/separatePeptideOrigin/step2_assign_mutation_type_kmers/20241216_RUN_Isolate_mutation_type.ipynb It generates a map path_save = os.path.join(base_dir, f'filter_{sample}/result/FILTERED/part-kmers_CLASS_MAP.tsv.gz') Then several plots are generated.

A. A new plot (swarmplot) is created and applied to previous outputs. This looks good for some of the filtering results projects2020_immunopepper_analysis/plotting/plot_proteomics_results/20240109_from_parsed_plot_raw_numbers_swarmplot.ipynb

B. The filtered k-mers are plotted per mutation class on a swarmplot. projects2020_immunopepper_analysis/plotting/plot_proteomics_results/20240109_from_parsed_plot_raw_numbers_swarmplot-multiclass_version.ipynb This plot can be found in:the Supplementary section of the paper.

C. The MS-validated k-mers are plotted per mutation class on a swarmplot. projects2020_immunopepper_analysis/plotting/plot_proteomics_results/20240109_parse_proteomics_results-kmers-rates-1_ReviewPaper.ipynb This plot can be found in:the Supplementary section of the paper.

Review Task 2: Compute some statistics about the number of somatic mutations applied to each of the TCGA samples + cross-run comparisons

projects2020_immunopepper_analysis/posthocAnalyses/20250203_Compare_gene.ipynb

Review Task 3: Translate the peptides (junctions) in:the wrong frame and assess the proteomics validation rate

  1. Step 1: Translate the peptides in:the wrong frame The code can be found in projects2020_immunopepper_analysis/immunopepper/translate_TCGA-BRCA (github project) Bash script to launch ImmunoPepper: immunopepper/translate_TCGA-BRCA/translate_wrong_frame/send_all_cross_sample_TCGA_frames.sh Example of command immunopepper/translate_TCGA-BRCA/translate_wrong_frame/run_file_ref.sh This plot can be found in:the Supplementary section of the paper.

  2. Then remove the annotated frames from all frames The operation performed is kmers all frames \ kmer annotated frame (novel or not) \ kmers from annotation \ Uniprot (Performed at the ImmunoPepper stage) Code in: projects2020_immunopepper_analysis/posthocAnalyses/translate_wrong_frame/ The filtering is performed in:a notebook posthocAnalyses/translate_wrong_frame/20250217_wrongFrame_select_kmers.ipynb This plot can be found in:the Supplementary section of the paper.

  3. Extract the 2 or 3 exon context peptides which contain:the filtered junction k-mers and create a fasta file Code is in projects2020_immunopepper_analysis/posthocAnalyses/translate_wrong_frame/ The fasta is generated in:a notebook (A bit slow) posthocAnalyses/translate_wrong_frame/20240217_wrongFrame_Fasta_matching.ipynb Therefore the code is also in: posthocAnalyses/translate_wrong_frame/20240217_wrongFrame_Fasta_matching.py The following helper code is used posthocAnalyses/translate_wrong_frame/helpers_format_peptides.py Updated to script in posthocAnalyses/translate_wrong_frame/helpers_metadata_matching.py NOTE that some "batches (of up to 10 genes), could not be run. There are total 2035 batches of 10 genes. Batch 220, 495-6, 535, 1000, 1051, 1106, 1945 but this does not matter too much because we are going to sample the peptides anyways.

  4. Digest the peptides from the fasta file, filter the tryptic peptide for size and make unique Code is in: projects2020_immunopepper_analysis/posthocAnalyses/pepDigest_wrong_frame/send_trypsine.sh Helper code in: projects2020_immunopepper_analysis/posthocAnalyses/pepDigest_wrong_frame/andy_lin_scripts/ Non digested fasta has 1881861 peptides Digested fasta with some processing 1123529 After exclusion of peptides that are too short or too long 851857 After unicity operation the number of peptides is 476485

  5. Sample the tryptic peptides and generate small fasta files. The sampling is performed to match the number of candidates that we as input for proteomics in the analysis of the paper. The motivation behind the sampling is that the validation rate is heavily influenced by the size of the peptide set. Code is in: projects2020_immunopepper_analysis/posthocAnalyses/pepSampleFasta/20240217_wrongFrame_Fasta_sampling.ipynb Performed sampling 10 times

  6. Proteomics with Subset Neighbor Search: Compute the neighbor peptides and index the database Code in: projects2020_immunopepper_analysis/posthocAnalyses/pepNeighborsSearch_wrong_frame/20240108-tide-index

  7. Proteomics with Subset Neighbor Search: Perform comparison of spectra with the peptide database (crux search engine) Code in: projects2020_immunopepper_analysis/posthocAnalyses/pepNeighborsSearch_wrong_frame/20240108-tide-search (Each of the fraction for the sample is matched)

  8. Proteomics with Subset Neighbor Search: Perform FDR calculation Code in: projects2020_immunopepper_analysis/posthocAnalyses/pepNeighborsSearch_wrong_frame/20240109-FDR-correct

  9. Proteomics with PepQuery: Code in: projects2020_immunopepper_analysis/posthocAnalyses/pepQuery_wrong_frame

  10. Extraction of validation rates Code in: projects2020_immunopepper_analysis/posthocAnalyses/pepValidationRate_wrong_frame/20250219_parse_proteomics_rates.ipynb

The plots related to this experiement can be found in:the Supplementary section of the paper.

About

Analyses for ImmunoPepper paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors