Skip to content

Commit e14562b

Browse files
adding dataset documentation metrics and adding to rst file
1 parent 548aab4 commit e14562b

File tree

5 files changed

+867
-229
lines changed

5 files changed

+867
-229
lines changed

coderdata/dataset.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ datasets:
9191
- experiments
9292

9393
hcmi:
94-
description: Human Cancer Models Initiative (HCMI) encompasses numerous cancer types and includes cell line, organoid, and tumor data.
94+
description: Human Cancer Models Initiative (HCMI) encompasses numerous cancer types and includes cell line, organoid, and tumor data. The models and the data are from the Human Cancer Models Initiative (HCMI) www.cancer.gov/ccg/research/functional-genomics/hcmi; dbGaP accession number phs001486.
9595
modalities:
9696
- sample
9797
- transcriptomics
Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
dataset,curve_metric,num_drugs
2+
beataml,aac,108
3+
beataml,auc,108
4+
beataml,dss,108
5+
beataml,fit_auc,108
6+
beataml,fit_ec50,108
7+
beataml,fit_ec50se,108
8+
beataml,fit_einf,108
9+
beataml,fit_hs,108
10+
beataml,fit_ic50,108
11+
beataml,fit_r2,108
12+
mpnst,aac,25
13+
mpnst,auc,25
14+
mpnst,dss,25
15+
mpnst,fit_auc,25
16+
mpnst,fit_ec50,25
17+
mpnst,fit_ec50se,25
18+
mpnst,fit_einf,25
19+
mpnst,fit_hs,25
20+
mpnst,fit_ic50,25
21+
mpnst,fit_r2,25
22+
pancpdo,aac,5
23+
pancpdo,auc,5
24+
pancpdo,dss,5
25+
pancpdo,fit_auc,5
26+
pancpdo,fit_ec50,5
27+
pancpdo,fit_ec50se,5
28+
pancpdo,fit_einf,5
29+
pancpdo,fit_hs,5
30+
pancpdo,fit_ic50,5
31+
pancpdo,fit_r2,5
32+
sarcpdo,published_auc,33
33+
colorectal,aac,10
34+
colorectal,auc,10
35+
colorectal,dss,10
36+
colorectal,fit_auc,10
37+
colorectal,fit_ec50,10
38+
colorectal,fit_ec50se,10
39+
colorectal,fit_einf,10
40+
colorectal,fit_hs,10
41+
colorectal,fit_ic50,10
42+
colorectal,fit_r2,10
43+
bladderpdo,aac,50
44+
bladderpdo,auc,50
45+
bladderpdo,dss,50
46+
bladderpdo,fit_auc,50
47+
bladderpdo,fit_ec50,50
48+
bladderpdo,fit_ec50se,50
49+
bladderpdo,fit_einf,50
50+
bladderpdo,fit_hs,50
51+
bladderpdo,fit_ic50,50
52+
bladderpdo,fit_r2,50
53+
liver,aac,73
54+
liver,auc,73
55+
liver,dss,73
56+
liver,fit_auc,73
57+
liver,fit_ec50,73
58+
liver,fit_ec50se,73
59+
liver,fit_einf,73
60+
liver,fit_hs,73
61+
liver,fit_ic50,73
62+
liver,fit_r2,73
63+
novartis,TGI,23
64+
novartis,abc,23
65+
novartis,lmm,23
66+
novartis,mRESCIST,23
67+
ccle,aac,24
68+
ccle,auc,24
69+
ccle,dss,24
70+
ccle,fit_auc,24
71+
ccle,fit_ec50,24
72+
ccle,fit_ec50se,24
73+
ccle,fit_einf,24
74+
ccle,fit_hs,24
75+
ccle,fit_ic50,24
76+
ccle,fit_r2,24
77+
ctrpv2,aac,460
78+
ctrpv2,auc,460
79+
ctrpv2,dss,460
80+
ctrpv2,fit_auc,460
81+
ctrpv2,fit_ec50,460
82+
ctrpv2,fit_ec50se,460
83+
ctrpv2,fit_einf,460
84+
ctrpv2,fit_hs,460
85+
ctrpv2,fit_ic50,460
86+
ctrpv2,fit_r2,460
87+
fimm,aac,52
88+
fimm,auc,52
89+
fimm,dss,52
90+
fimm,fit_auc,52
91+
fimm,fit_ec50,52
92+
fimm,fit_ec50se,52
93+
fimm,fit_einf,52
94+
fimm,fit_hs,52
95+
fimm,fit_ic50,52
96+
fimm,fit_r2,52
97+
gdscv1,aac,293
98+
gdscv1,auc,293
99+
gdscv1,dss,293
100+
gdscv1,fit_auc,293
101+
gdscv1,fit_ec50,293
102+
gdscv1,fit_ec50se,293
103+
gdscv1,fit_einf,293
104+
gdscv1,fit_hs,293
105+
gdscv1,fit_ic50,293
106+
gdscv1,fit_r2,293
107+
gdscv2,aac,169
108+
gdscv2,auc,169
109+
gdscv2,dss,169
110+
gdscv2,fit_auc,169
111+
gdscv2,fit_ec50,169
112+
gdscv2,fit_ec50se,169
113+
gdscv2,fit_einf,169
114+
gdscv2,fit_hs,169
115+
gdscv2,fit_ic50,169
116+
gdscv2,fit_r2,169
117+
gcsi,aac,43
118+
gcsi,auc,43
119+
gcsi,dss,43
120+
gcsi,fit_auc,43
121+
gcsi,fit_ec50,43
122+
gcsi,fit_ec50se,43
123+
gcsi,fit_einf,43
124+
gcsi,fit_hs,43
125+
gcsi,fit_ic50,43
126+
gcsi,fit_r2,43
127+
prism,aac,1418
128+
prism,auc,1418
129+
prism,dss,1418
130+
prism,fit_auc,1418
131+
prism,fit_ec50,1418
132+
prism,fit_ec50se,1418
133+
prism,fit_einf,1418
134+
prism,fit_hs,1418
135+
prism,fit_ic50,1418
136+
prism,fit_r2,1418
137+
nci60,aac,54654
138+
nci60,auc,54654
139+
nci60,dss,54654
140+
nci60,fit_auc,54654
141+
nci60,fit_ec50,54654
142+
nci60,fit_ec50se,54654
143+
nci60,fit_einf,54654
144+
nci60,fit_hs,54654
145+
nci60,fit_ic50,54654
146+
nci60,fit_r2,54654
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
dataset,num_samples,num_drugs,num_sample_drug_pairs,num_sample_drug_transcript_pairs,num_sample_drug_transcript_mutation_pairs,num_sample_drug_transcript_copynum_pairs,num_sample_drug_mutation_copynum_pairs
2+
hcmi,886,,,,,,
3+
beataml,1022,164,23662,3033,2905,,
4+
mpnst,50,25,212,163,163,163,163
5+
pancpdo,70,25,290,180,175,175,285
6+
cptac,1139,,,,,,
7+
sarcpdo,36,34,276,234,187,,
8+
colorectal,61,10,140,60,60,60,140
9+
bladderpdo,134,50,3300,840,640,640,3100
10+
liver,62,76,4453,4453,4453,4453,4453
11+
novartis,386,25,1766,1734,1734,1723,1723
12+
ccle,502,24,11543,10887,10792,10887,11118
13+
ctrpv2,846,460,310564,301263,296487,300452,301373
14+
fimm,52,52,2663,2457,2457,2457,2611
15+
gdscv1,984,293,246807,244282,241074,240318,241644
16+
gdscv2,806,169,113964,112911,111387,111085,111687
17+
gcsi,569,43,13229,12320,12155,12320,12919
18+
prism,478,1418,638684,631784,630379,631784,635929
19+
nci60,83,54654,2933857,2307990,2307977,2307990,2759211

docs/source/datasets_included.rst

Lines changed: 76 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,87 @@
11
Datasets Included
22
=================
33

4-
Datsets page coming soon...
4+
This page provides an overview of the datasets included in CoderData version 2.1.0.
55

6-
.. Mock-up is on canva
6+
Figshare record: https://api.figshare.com/v2/articles/28823159
7+
Version: 2.1.0
78

8-
.. Summary
9-
.. -------
10-
.. Overview of datasets included such as sources, datasets and datatype
9+
---------------------------
10+
Dataset Overview
11+
---------------------------
12+
.. csv-table:: Datasets and Modalities
13+
:header: "Dataset", "References", "Sample", "Transcriptomics", "Proteomics", "Mutations", "Copy Number", "Drug", "Drug Descriptor", "Experiments"
14+
:widths: 12, 10, 6, 12, 12, 12, 12, 8, 15, 12
1115

12-
.. Insert schema here
13-
.. .. raw:: html
16+
"BeatAML", "[1]_, [2]_", "X", "X", "X", "X", "", "X", "X", "X"
17+
"BladderPDO", "[3]_", "X", "X", "", "X", "X", "X", "X", "X"
18+
"CCLE", "[4]_", "X", "X", "X", "X", "X", "X", "X", "X"
19+
"CPTAC", "[5]_", "X", "X", "X", "X", "X", "", "", ""
20+
"CTRPv2", "[6]_, [7]_, [8]_", "X", "X", "", "X", "X", "X", "X", "X"
21+
"FIMM", "[9]_, [10]_", "X", "X", "", "", "", "X", "X", "X"
22+
"HCMI", "[11]_", "X", "X", "", "X", "X", "", "", ""
23+
"MPNST", "[12]_", "X", "X", "X", "X", "X", "X", "X", "X"
24+
"NCI60", "[13]_", "X", "X", "X", "X", "", "X", "X", "X"
25+
"Pancreatic PDO", "[14]_", "X", "X", "", "X", "X", "X", "X", "X"
26+
"PRISM", "[15]_, [16]_", "X", "X", "", "", "", "X", "X", "X"
27+
"Sarcoma PDO", "[17]_", "X", "X", "", "X", "", "X", "X", "X"
28+
"CRC PDO", "[18]_", "X", "X", "", "X", "X", "X", "X", ""
29+
"Liver PDO", "[19]_", "X", "X", "", "X", "X", "X", "X", ""
30+
"Novartis PDX", "[20]_", "X", "X", "", "X", "X", "X", "X", ""
31+
"gCSI", "[21]_, [22]_", "X", "X", "X", "X", "X", "X", "X", ""
32+
"GDSC v1", "[23]_, [24]_, [25]_", "X", "X", "X", "X", "X", "X", "X", ""
33+
"GDSC v2", "[23]_, [24]_, [25]_", "X", "X", "X", "X", "X", "X", "X", ""
1434

15-
.. <iframe src="_static/datasets_table.html" width= 800 px height= 600 px></iframe>
35+
The table above lists the datasets included in CoderData version 2.1.0, along with references to their original publications and the types of data available for each dataset. An "X" indicates the presence of a particular data type for the corresponding dataset.
1636

1737

38+
---------------------------
39+
Dataset Summary Statistics
40+
---------------------------
41+
The following table summarizes key statistics for each dataset, including the number of samples, drugs, and various combinations of sample-drug pairs with different molecular data types.
1842

43+
.. csv-table:: Dataset Summary Statistics
44+
:file: _static/dataset_summary_statistics.csv
45+
:header-rows: 0
1946

20-
47+
48+
---------------------------------
49+
Drug Curve Metrics Collected
50+
---------------------------------
51+
The following table summarizes the number of drugs associated with each dose-response metric across the datasets.
52+
53+
.. csv-table:: Drug Curve Metrics Summary
54+
:file: _static/dataset_curve_metric_summary.csv
55+
:header-rows: 0
56+
57+
58+
59+
---------------------------
60+
References
61+
---------------------------
62+
63+
.. [1] Bottomly D, Long N, Schultz AR, et al. *Integrative analysis of drug response and clinical outcome in acute myeloid leukemia.* Cancer Cell. 2022;40(8):850-864.e9. doi:`10.1016/j.ccell.2022.07.002 <https://doi.org/10.1016/j.ccell.2022.07.002>`_
64+
.. [2] Pino JC, Posso C, Joshi SK, et al. *Mapping the proteogenomic landscape enables prediction of drug response in acute myeloid leukemia.* Cell Rep Med. 2024;5(1):101359. doi:`10.1016/j.xcrm.2023.101359 <https://doi.org/10.1016/j.xcrm.2023.101359>`_
65+
.. [3] Lee SH, Hu W, Matulay JT, et al. *Tumor Evolution and Drug Response in Patient-Derived Organoid Models of Bladder Cancer.* Cell. 2018;173(2):515-528.e17. doi:`10.1016/j.cell.2018.03.017 <https://doi.org/10.1016/j.cell.2018.03.017>`_
66+
.. [4] Barretina J, Caponigro G, Stransky N, et al. *The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.* Nature. 2012;483(7391):603-607. doi:`10.1038/nature11003 <https://doi.org/10.1038/nature11003>`_
67+
.. [5] Lindgren CM, Adams DW, Kimball B, et al. *Simplified and Unified Access to Cancer Proteogenomic Data.* J Proteome Res. 2021;20(4):1902-1910. doi:`10.1021/acs.jproteome.0c00919 <https://doi.org/10.1021/acs.jproteome.0c00919>`_
68+
.. [6] Rees MG, Seashore-Ludlow B, Cheah JH, et al. *Correlating chemical sensitivity and basal gene expression reveals mechanism of action.* Nat Chem Biol. 2016;12(2):109-116. doi:`10.1038/nchembio.1986 <https://doi.org/10.1038/nchembio.1986>`_
69+
.. [7] Seashore-Ludlow B, Rees MG, Cheah JH, et al. *Harnessing Connectivity in a Large-Scale Small-Molecule Sensitivity Dataset.* Cancer Discov. 2015;5(11):1210-1223. doi:`10.1158/2159-8290.CD-15-0235 <https://doi.org/10.1158/2159-8290.CD-15-0235>`_
70+
.. [8] Basu A, Bodycombe NE, Cheah JH, et al. *An interactive resource to identify cancer genetic and lineage dependencies targeted by small molecules.* Cell. 2013;154(5):1151-1161. doi:`10.1016/j.cell.2013.08.003 <https://doi.org/10.1016/j.cell.2013.08.003>`_
71+
.. [9] Mpindi JP, Yadav B, Östling P, et al. *Consistency in drug response profiling.* Nature. 2016;540(7631):E5-E6. doi:`10.1038/nature20171 <https://doi.org/10.1038/nature20171>`_
72+
.. [10] Pemovska T, Kontro M, Yadav B, et al. *Individualized systems medicine strategy to tailor treatments for patients with chemorefractory acute myeloid leukemia.* Cancer Discov. 2013;3(12):1416-1429. doi:`10.1159/2159-8290.CD-13-0350 <https://doi.org/10.1158/2159-8290.CD-13-0350>`_
73+
.. [11] Human Cancer Models Initiative (HCMI). dbGaP accession phs001486. `https://cancer.gov/ccg/research/functional-genomics/hcmi <https://cancer.gov/ccg/research/functional-genomics/hcmi>`_
74+
.. [12] Dehner C, Moon CI, Zhang X, et al. *Chromosome 8 gain is associated with high-grade transformation in MPNST.* JCI Insight. 2021;6(6):e146351. doi:`10.1172/jci.insight.146351 <https://doi.org/10.1172/jci.insight.146351>`_
75+
.. [13] Shoemaker RH. *The NCI60 human tumour cell line anticancer drug screen.* Nat Rev Cancer. 2006;6(10):813-823. doi:`10.1038/nrc1951 <https://doi.org/10.1038/nrc1951>`_
76+
.. [14] Tiriac H, Belleau P, Engle DD, et al. *Organoid Profiling Identifies Common Responders to Chemotherapy in Pancreatic Cancer.* Cancer Discov. 2018;8(9):1112-1129. doi:`10.1158/2159-8290.CD-18-0349 <https://doi.org/10.1158/2159-8290.CD-18-0349>`_
77+
.. [15] Corsello SM, Nagari RT, Spangler RD, et al. *Discovering the anti-cancer potential of non-oncology drugs by systematic viability profiling.* Nat Cancer. 2020;1(2):235-248. doi:`10.1038/s43018-019-0018-6 <https://doi.org/10.1038/s43018-019-0018-6>`_
78+
.. [16] Yu C, Mannan AM, Yvone GM, et al. *High-throughput identification of genotype-specific cancer vulnerabilities in mixtures of barcoded tumor cell lines.* Nat Biotechnol. 2016;34(4):419-423. doi:`10.1038/nbt.3460 <https://doi.org/10.1038/nbt.3460>`_
79+
.. [17] Al Shihabi A, Tebon PJ, Nguyen HTL, et al. *The landscape of drug sensitivity and resistance in sarcoma.* Cell Stem Cell. 2024;31(10):1524-1542.e4. doi:`10.1016/j.stem.2024.08.010 <https://doi.org/10.1016/j.stem.2024.08.010>`_
80+
.. [18] van de Wetering M, Francies HE, Francis JM, et al. *Prospective derivation of a living organoid biobank of colorectal cancer patients.* Cell. 2015;161(4):933-945. doi:`10.1016/j.cell.2015.03.053 <https://doi.org/10.1016/j.cell.2015.03.053>`_
81+
.. [19] Ji S, Feng L, Fu Z, et al. *Pharmaco-proteogenomic characterization of liver cancer organoids for precision oncology.* Sci Transl Med. 2023;15(706):eadg3358. doi:`10.1126/scitranslmed.adg3358 <https://doi.org/10.1126/scitranslmed.adg3358>`_
82+
.. [20] Gao H, Korn JM, Ferretti S, et al. *High-throughput screening using patient-derived tumor xenografts to predict clinical trial drug response.* Nat Med. 2015;21(11):1318–1325. doi:`10.1038/nm.3954 <https://doi.org/10.1038/nm.3954>`_
83+
.. [21] Haverty PM, Lin E, Tan J, et al. *Reproducible pharmacogenomic profiling of cancer cell line panels.* Nature. 2016;533(7603):333–337. doi:`10.1038/nature17987 <https://doi.org/10.1038/nature17987>`_
84+
.. [22] Klijn C, Durinck S, Stawiski EW, et al. *A comprehensive transcriptional portrait of human cancer cell lines.* Nat Biotechnol. 2015;33(3):306–312. doi:`10.1038/nbt.3080 <https://doi.org/10.1038/nbt.3080>`_
85+
.. [23] Garnett MJ, Edelman EJ, Heidorn SJ, et al. *Systematic identification of genomic markers of drug sensitivity in cancer cells.* Nature. 2012;483(7391):570–575. doi:`10.1038/nature11005 <https://doi.org/10.1038/nature11005>`_
86+
.. [24] Iorio F, Knijnenburg TA, Vis DJ, et al. *A Landscape of Pharmacogenomic Interactions in Cancer.* Cell. 2016;166(3):740–754. doi:`10.1016/j.cell.2016.06.017 <https://doi.org/10.1016/j.cell.2016.06.017>`_
87+
.. [25] Yang W, Soares J, Greninger P, et al. *Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells.* Nucleic Acids Res. 2013;41(Database issue):D955–D961. doi:`10.1093/nar/gks1111 <https://doi.org/10.1093/nar/gks1111>`_

0 commit comments

Comments
 (0)