Skip to content

Conversation

Rima-Waleed
Copy link
Collaborator

@Rima-Waleed Rima-Waleed commented Jun 4, 2025

Study_ID Testing Instance Link Sample Count
brca_dldccc_2022 https://private.cbioportal.mskcc.org/study/summary?id=brca_dldccc_2022 75 Samples
luad_cas_2020 https://cbioportal.mskcc.org/study/summary?id=luad_cas_2020 103 Samples
blca_msk_2024 https://cbioportal.mskcc.org/study/summary?id=blca_msk_2024 184 Samples
pancan_mimsi_msk_2024 https://www.cbioportal.org/study/summary?id=pancan_mimsi_msk_2024 5,033 Samples
pancan_pdx_uthsa_2023 https://private.cbioportal.mskcc.org/study/summary?id=pancan_pdx_uthsa_2023 136 Samples
crc_orion_2024 https://private.cbioportal.mskcc.org/study/summary?id=crc_orion_2024 74 Samples
hnsc_a5consortium_2025 https://triage.cbioportal.mskcc.org/study/summary?id=hnsc_a5consortium_2025 94 Samples
crc_sysucc_2022 https://private.cbioportal.mskcc.org/study/summary?id=crc_sysucc_2022 1,015 Samples
ccle_broad_2025 https://private.cbioportal.mskcc.org/study/summary?id=ccle_broad_2025 1,970 Samples
ccle_genentech_2014 https://private.cbioportal.mskcc.org/study/summary?id=ccle_genentech_2014 675 Samples
sft_sysucc_2023 https://cbioportal.mskcc.org/study/summary?id=sft_sysucc_2023 131 Samples
schw_ctf_synodos_2025 https://www.cbioportal.org/study/summary?id=schw_ctf_synodos_2025 52 Samples
Total 9,642

Review Notes:

pancan_mimsi_msk_2024: The cBioPortal instance has 4 missing samples compared to the paper- author was contacted and confirmed 2 samples were a database error, and 2 samples are purposely listed twice in their data because they are both MLH1 and PMS2 deficient by IHC.

Pending data from authors:

blca_msk_2024: clinical data (PFS, OS, demographic data, responders vs. nonresponders) & drug response data
pancan_pdx_uthsa_2023: Mutational signatures, RNA-seq, patient 560- confirm tumor type (for 560-LM_PDX & 560-SM_PDX: osteosarcoma vs. wilms tumor), Engraftment data
hnsc_a5consortium_2025: PG/PC categorization, methylation & mutational sig data
sft_sysucc_2023: IHC, MxIF
crc_sysucc_2022: mutational signatures, viral data, HLA genotyping data
schw_ctf_synodos_2025: methylation, CNA, SV, clinical

@rmadupuri rmadupuri added new public study A new study or dataset is being submitted or proposed for inclusion in the public portal Under Review labels Jun 12, 2025
@ritikakundra
Copy link
Collaborator

ritikakundra commented Jun 30, 2025

https://private.cbioportal.mskcc.org/study/summary?id=sclc_uc_2024:

  • For samples on WGS and WES, are we distinguishing mutations based on platform? No
  • There is mutational signature data in the paper. I've requested the author.
  • Where did the RNA-Seq data come from? From supplementary table 10
  • Were the Z-scores not computed? Done
  • The homepage does not give the options of SV or RNA Seq, just mutations.

https://private.cbioportal.mskcc.org/study/summary?id=brca_dldccc_2022

  • Typo in the description Fixed
  • Is the cancer type correct for TNBC? Fixed
  • Is there any timeline data here?
  • How are we distingushing pCR and non-pCR?
    *Pathological complete response (pCR) chart indicating patients with pCR and non-pCR
  • How are we cataloging the Proteogenomic features?

https://private.cbioportal.mskcc.org/study/summary?id=luad_cas_2020

  • Were the Z-scores not computed? Done
  • The homepage does not give the options of RNA Seq just mutations. Fixed

@rmadupuri
Copy link
Collaborator

rmadupuri commented Jul 2, 2025

Thanks Rima! The data looks good. just a few questions and suggestions for improvement below

sft_sysucc_2023:

  • Study is missing the gene panel information. A pan-cancer 1021 gene panel was used.
  • Is the data based on hg38 or hg19? The paper says reads were aligned to hg38, but the mutation file appears to use hg19.
    *Author was contacted for clarification and confirmed their NGS data was matched to GRCh37.
  • The Age and cell density fields show ≥ instead of . Are the data files UTF-8 encoded?
  • Does Healthy in Patient Status mean Progression Free? Can we replace that? see Table 1
  • Supp Table 12 includes PFS, MFS and RFI. I believe the time column is a snapshot for all three events at a given time (needs double checking). Can we also MFS and RFI?
    *MFS and RFI added, confirming time anchor from author
  • We might also be able to get IHC and MxIF images from the authors
    *Reached out to author for data
  • Is the CNA data available?
    *Author confirmed CNA was not analyzed for this data
  • How is Mitotic Counts calculated? It may be best to use the values from the table.
    *The author provided a data file with PFS, risk stratification, and mitotic counts.
  • The journal name is inconsistent between the study title and the citation. update to match other public studies.
  • Update the citation name to Zhang et al.
  • Study is missing allele counts.
    *Not provided by authors

crc_sysucc_2022:

  • What is the difference between Family History and CRC Family History?
    *Attribute name/description updated, CRC Family History refers to whether the patient had a family history or CRC, while Family History indicates whether the patient had a family history of cancer in general.
  • The panel file is missing the CNA column.
  • Where is the genomic data from? Can we attach a README file? (Can’t access the link in the paper: https://changkang.hapyun.com/)
    *Readme.md added
  • Does the mutation data include germline variants?
    *No, the mutation data does not include germline variants; those were filtered and removed as described in the methods section:Alignment and somatic mutation calling for genomes.
  • Does it include mitochondrial genes?
    *Yes, the mutation data includes the mitochondrial genes
  • Is there any data on HPV, HBV or EBV at the source?
    *Data not provided in supplementary files, reached out to author and request data.
  • The study mentions HLA typing and neoantigens. Can we include that data?
    *Reached out to author to request data.
  • Mutational signatures are reported, should we reach out for that data?
    *Reached out to author to request data.
  • Since all patients are Chinese, can we note that in the study description and add a Race field?
  • Also, let’s mention the ChangKang Project name in the study name/description.
  • The alteration percentages are slightly different from Fig 3 in the paper?
    *The study reports both synonymous and nonsynonymous variants in its mutation data, which will contribute to different alteration frequencies.

ccle_broad_2024:

  • convert the readme.txt to a markdown file for github
  • Should cell lines from matched normal and non-cancerous tissues be included in the cohort? There are about 137 such cases (see Oncotree Primary Disease column).
    Will be discussed
  • Patient IDs in the 2024 CCLE study don't match the 2019 version in the portal. We should make them consistent. Also, DepMap portal uses cell line names for search and no ref to patient IDs.
    **The 2019 version did not include patient IDs in the supplementary data, however the 2024 version does include patient IDs-it maps matched normal samples to the cancerous; this will be discussed further
  • The sample type should rather reflect Cell Line instead of the original tissue type. Update Sample Type → Source Sample Type, Cell Type → Sample Type.
  • Allele counts are missing in the portal but available on DepMap portal. Was the data download missing this info?
  • There are still small differences in mutation counts between the cBioPortal and DepMap portals. Could this be due to data version differences? Was this double checked?
    *Depmap released their 25Q2 update last week, this includes some updates to the mutation pipeline and additional WGS data- the differences in mutation counts between the cbioportal and depmap may be attributed to this update (notes about the update can be found here. The depmap portal always displays data from the most current release and doesn't include a feature to toggle between previous releases- the data can only can be manually downloaded.

@rmadupuri
Copy link
Collaborator

rmadupuri commented Jul 9, 2025

hnsc_a5consortium_2025:

  • Paper got published in Nature Communications. Update journal name, year
  • SV Table is showing up blank?
  • The cohort shows 0% SDHB variants? Germline variants from Table1 can be added to MAF. See Germline SDHB genomic coordinate column.
    Screenshot 2025-07-09 at 4 55 22 PM
  • XK gene is 98% mutated? Please double check the MAF file - if the Entrez ID is missing and the Gene Symbol is listed as NA, the importer may incorrectly assign these rows to the XK gene, since NA is an alias for XK. See HGNC. Since the data is from WGS, double check the variant classification for IGR's so these get filtered out properly.
    Screenshot 2025-07-09 at 5 01 44 PM
    Screenshot 2025-07-09 at 5 11 00 PM
  • Age at first diagnosis is a range in the supp table. Is that attribute the same as what we have in the portal?
  • chromothriptic_event, TERT_ATRX_Mutation, telhunter_log2_telcontentratio, telhunter_RNA_telcontent variables could be added to sample level files.
  • How is the Sample Type defined in the portal for cases where values are Primary (Metastasis Reported) or Non-Metastatic Primary or Metastatic Primary
    *Updated 'Sample Type' attribute name to 'Clinical Behavior' to matched the paper's description
  • Standardize the xUln in Chromogranin A levels Preop attribute.
  • Missing Oncotree code, CT, CTD
    *Added all tumors as Paraganglioma, since pheochromocytomas is considered a type of PGNG

crc_orion_2024:

  • Missing anchor time in OS & PFS attribute description - measured from primary resection to the event/censor
  • Can we indicate the matched normal status? I am guessing no matched normal since DFCI OncoPanel?
  • May be add biorxiv for now in place of journal - since this is a preprint (and we can replace once published)

ccle_genetech_2014:

  • Replace Genetech in study id, name.. to use Genentech.
  • Maybe we can provide a direct link to the DepMap CCLE study in the portal for comparison. There's CCLE_ID column but that;s not an exact match to the DepMap study ID's. (lets discuss this further)
  • The Event Info can be better formatted.
  • Supp Data 4 has copy number calls. Can it be used?
  • Does the case lists reflect the correct seq counts?

@rmadupuri
Copy link
Collaborator

rmadupuri commented Aug 12, 2025

sclc_uc_2024
*Study removed as authors only present mutations on a patient level

  • portal instance shows the SV case list but not the actual data. Can we add data from Supp Table 7?
  • The FPKM Z-score data is not loaded to the portal.
  • Set show_profile_in_analysis_tab to true for the zscores file
  • The RNA case list is missing. Only 62 samples were processed for RNA-Seq.
  • Supp Table 5 has mutational signatures.
  • Fig 1b (https://www.nature.com/articles/s41586-024-07177-7/figures/1) has a lot of treatment data plotted on the timeline. Can we get that?
  • The mutation data doesn't appear to be correct. Only the first sample of each patient shows mutations, while the other samples have none. Since Supp Table 6 is at the patient level, how was the data transformed to the sample level?

brca_dldccc_2022

  • Missing expression zscores
  • Missing protein zscores
  • Set show_profile_in_analysis_tab to true for zscores files.
  • Phophoproteomics data can be added from TableS3C (site level) as generic assay.
    *Data is being added

luad_cas_2020

  • Add Race/Ethnicity
  • Timeline can be created based on dates in Date of surgical resection, Date of recurrence, Date of death, Last date visited wrt to sample collection
  • Missing mrna case list. 51 sequenced tumors.
  • Missing FPKM Z-score profile. Set show_profile_in_analysis_tab to true.
  • Readme says the study is hg38 but maf is 37 build?
  • Is there protein and phosphoprotein level data?
    *Pending from author

blca_msk_2024

  • Include the trial name
  • Is there a clinical element to distinguish responders from non-responders?
    Pending from author
  • Missing panel info
  • Mutation percentages differ from what is presented in the paper.
    Paper displays oncoprint for deleterious mutations vs cohort showing all mutations from impact
Screenshot 2025-08-12 at 17 21 05

pancan_pdx_uthsa_2023:

  • Update Data Type attribute to Sample Class with values Xenograft & Tumor.
  • Missing Oncotree Code
  • MutSig data in Supp Table 4
  • Telomere length in S6. Paper used avg of WGS and WXS.
  • Table S9 has Chromosome instability scores - can be added as generic assay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new public study A new study or dataset is being submitted or proposed for inclusion in the public portal Under Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants