Skip to content

TCGA analysis #95

@jyaacoub

Description

@jyaacoub

primary tasks

Downloading and getting TCGA MAF files

Downloading using *TCGAbiolinks*

What project to use?

"TCGA projects are organized by cancer type or subtype."
Updated projects can be found here, but lets just focus on TCGA-BRCA for now

  • using the legacy version of the data portal we can gain access to the open version of TCGA-BRCA instead of the newer but closed version

How to download TCGA-BRCA mafs?

Update sys packages

sudo apt update
sudo apt upgrade -y

Install R

README
Add apt repo:

sudo add-apt-repository "deb https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/"

Install R

sudo apt update
sudo apt install r-base

Install sys packages required by R

sudo apt install libcurl4-openssl-dev libssl-dev libxml2-dev -y

Install TCGABiolinks package

make sure to run in sudo mode

sudo -i R

Then install:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("TCGAbiolinks")

download TCGA-BRCA

sort(harmonized.data.type)
Aggregated Somatic Mutation
...
Masked Somatic Mutation
Masked Somatic Mutation
...
Methylation Beta Value
Splice Junction Quantification
library(TCGAbiolinks)
query <- GDCquery(project = "TCGA-BRCA", 
				  data.category = "Simple Nucleotide Variation",
				  data.type = "Masked Somatic Mutation",
				  file.type = "maf.gz", 
				  access = "open")

GDCdownload(query)
data <- GDCprepare(query)

to exit R:

q()

Save TCGAbiolinks R file as CSV:

write.csv(data, "TCGA_BRCA_Mutations.csv", row.names = FALSE)

Another way is to just use the TCGA portal and download the entire cohort for each project

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions