Skip to content

Commit 1031bcb

Browse files
committed
Merge branch 'release_03' of github.com:ECP-CANDLE/Benchmarks into release_03
2 parents e2770da + 3ef7c9f commit 1031bcb

File tree

2 files changed

+17
-13
lines changed

2 files changed

+17
-13
lines changed

examples/ADRP/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
1-
The Pilot1 ADRP Benchmark loads a csv file
1+
# Pilot1 ADRP Benchmark
2+
3+
## loads a csv file
24

35
Benchmark auto downloads the file below:
46
http://ftp.mcs.anl.gov/pub/candle/public/benchmarks/Pilot1/uno/ (~500MB)

examples/M16/README.md

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
1-
# Background
1+
# Data preprocessing - feature selection examples
2+
3+
## Background
24

35
Data preprocessing is an important front-end step in data analysis that prepares data for subsequent analysis.
46
It not only enables the subsequent analysis by processing and transforming data, but also influences the quality of subsequent analysis sometimes significantly.
@@ -56,13 +58,13 @@ To perform co-expression extrapolation (COXEN) analysis [3] that selects predict
5658

5759
To extend the COXEN approach for selecting genes to predict the response of tumor cells to multiple drugs in precision oncology applications.
5860

59-
# Running the example
61+
## Running the example
6062

6163
The code demonstrates feature selection methods that CANDLE provides.
6264

6365
It can be run by executing ``` python M16_test.py ```
6466

65-
## Download data
67+
### Download data
6668
Code
6769
```python
6870
# download all the data if needed from the repo
@@ -88,7 +90,7 @@ Origin = http://ftp.mcs.anl.gov/pub/candle/public/benchmarks/Pilot1/uno/Candle_
8890
Origin = http://ftp.mcs.anl.gov/pub/candle/public/benchmarks/Pilot1/uno/Candle_Milestone_16_Version_12_15_2019/Data/Data_For_Testing/CCLE_NCI60_Gene_Expression_Full_Data.txt
8991
```
9092

91-
## Download gene set
93+
### Download gene set
9294
Code
9395
```python
9496
# download all the gene_set files needed
@@ -124,7 +126,7 @@ Origin = http://ftp.mcs.anl.gov/pub/candle/public/benchmarks/Pilot1/uno/Candle_
124126
Gene Set data is locally stored at /Users/hsyoo/projects/CANDLE/Benchmarks/common/../Data/examples/Gene_Sets/MSigDB.v7.0/
125127
```
126128

127-
# Select features based on missing values
129+
### Select features based on missing values
128130
Code
129131
```python
130132
print('Testing select_features_by_missing_values')
@@ -161,7 +163,7 @@ Select features with missing rates smaller than 0.3
161163
Feature IDs [0 1 2 3 4 5 6 9]
162164
```
163165

164-
# Select features based on variation
166+
### Select features based on variation
165167
Code
166168
```python
167169
print('Testing select_features_by_variation')
@@ -182,7 +184,7 @@ Select the top 2 features with the largest standard deviation
182184
Feature IDs [0 5]
183185
```
184186

185-
# Select decorrelated features
187+
### Select decorrelated features
186188
Code
187189
```python
188190
print('Testing select_decorrelated_features')
@@ -202,7 +204,7 @@ Select features whose absolute mutual Spearman correlation coefficient is smalle
202204
Feature IDs [0 2 6 9]
203205
```
204206

205-
# Generate cross-validation partitions of data
207+
### Generate cross-validation partitions of data
206208
Code
207209
```python
208210
print('Testing generate_cross_validation_partition')
@@ -248,7 +250,7 @@ Fitting L/S model and finding priors
248250
Finding parametric adjustments
249251
```
250252

251-
# Quantile normalization of gene expression data
253+
### Quantile normalization of gene expression data
252254
Code
253255
```python
254256
print('Testing quantile_normalization')
@@ -301,7 +303,7 @@ Max difference of median between cell lines is 0.02
301303
Max difference of first quartile between cell lines is 0.06
302304
```
303305

304-
# Generate gene-set-level data
306+
### Generate gene-set-level data
305307
```python
306308
print('Testing generate_gene_set_data')
307309
gene_set_data = candle.generate_gene_set_data(np.transpose(norm_data), [i[0] for i in norm_data.index], gene_name_type='entrez',
@@ -348,7 +350,7 @@ CCL_1078 -10.355489 ... -26.232325
348350
[897 rows x 186 columns]
349351
```
350352

351-
# Combat batch normalization on gene expression data
353+
### Combat batch normalization on gene expression data
352354
Code
353355
```python
354356
print('Testing combat_batch_effect_removal')
@@ -431,7 +433,7 @@ Average median of CCLE cell lines is 2.72
431433
Average first quartile of CCLE cell lines is 0.13
432434
```
433435

434-
# References
436+
## References
435437

436438
1. Bolstad BM, Irizarry RA, Astrand M, et al. \(2003\) *A comparison of normalization methods for high density oligonucleotide array data based on variance and bias* Bioinformatics. 2003 Jan 22;19\(2\):185-93.
437439

0 commit comments

Comments
 (0)