-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
Description
primary tasks
- download TCGA MAF files
- Map TCGA rows to proteins from our datasets
- DONT INCLUDE KIBA UNTIL WE GET ALL AFLOW CONFS (to ensure consistent test sets and so we don't have to redo the subsequent tasks)
- Get input data for TCGA (MSAs and aflow confirmations + ligands from our dbs) #99
- Run the same script as Platinum analysis #94 to get results for davis, kiba, and pdbbind pretrained models.
- Distribution level analysis for TCGA with mapped prots #111
Downloading and getting TCGA MAF files
Downloading using *TCGAbiolinks*
What project to use?
"TCGA projects are organized by cancer type or subtype."
Updated projects can be found here, but lets just focus on TCGA-BRCA for now
- using the legacy version of the data portal we can gain access to the open version of TCGA-BRCA instead of the newer but closed version
How to download TCGA-BRCA mafs?
Update sys packages
sudo apt update
sudo apt upgrade -y
Install R
README
Add apt repo:
sudo add-apt-repository "deb https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/"Install R
sudo apt update
sudo apt install r-baseInstall sys packages required by R
sudo apt install libcurl4-openssl-dev libssl-dev libxml2-dev -yInstall TCGABiolinks package
make sure to run in sudo mode
sudo -i R
Then install:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("TCGAbiolinks")download TCGA-BRCA
| sort(harmonized.data.type) |
|---|
| Aggregated Somatic Mutation |
| ... |
| Masked Somatic Mutation |
| Masked Somatic Mutation |
| ... |
| Methylation Beta Value |
| Splice Junction Quantification |
library(TCGAbiolinks)
query <- GDCquery(project = "TCGA-BRCA",
data.category = "Simple Nucleotide Variation",
data.type = "Masked Somatic Mutation",
file.type = "maf.gz",
access = "open")
GDCdownload(query)
data <- GDCprepare(query)to exit R:
q()Save TCGAbiolinks R file as CSV:
write.csv(data, "TCGA_BRCA_Mutations.csv", row.names = FALSE)Another way is to just use the TCGA portal and download the entire cohort for each project
Reactions are currently unavailable