Skip to content

Commit 8f7e02d

Browse files
committed
Merge branch 'master' of github.com:ECP-CANDLE/Benchmarks
2 parents 5db8ff4 + 6527021 commit 8f7e02d

File tree

4 files changed

+16
-12
lines changed

4 files changed

+16
-12
lines changed

Pilot3/P3B1/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
* Number of layers: 5-6 layers
3030

3131
A graphical representation of the MTL-DNN is shown below:
32-
![MTL-DNN Architecture](https://raw.githubusercontent.com/ECP-CANDLE/Benchmarks/master/P3B1/images/MTL1.png)
32+
![MTL-DNN Architecture](https://raw.githubusercontent.com/ECP-CANDLE/Benchmarks/master/Pilot3/P3B1/images/MTL1.png)
3333

3434
### Running the baseline implementation
3535
There are two broad options for running our MTL implementation. The first baseline option includes the basic training of an MTL-based deep neural net. The second implementation includes a standard 10-fold cross-validation loop and depends on the first baseline for building and training the MTL-based deep neural net.

Pilot3/P3B2/README.md

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
## P3B1: RNN-LSTM: A Generative Model for Clinical Path Reports
22
**Overview**:Given a sample corpus of biomedical text such as clinical reports, build a deep learning network that can automatically generate synthetic text documents with valid clinical context.
33

4-
**Relationship to core problem**:Labeled data is quite challenging to come by, specifically for patient data, since manual annotations are time consuming; hence, a core capability we intend to build is a “gold-standard” annotated data that is generated by deep learning networks to tune our deep text comprehension applications.
4+
**Relationship to core problem**:Labeled data is quite challenging to come by, specifically for patient data, since manual annotations are time consuming; hence, a core capability we intend to build is a “gold-standard” annotated data that is generated by deep learning networks to tune our deep text comprehension applications.
55

6-
**Expected Outcomes**:A generative RNN based on LSTMs that can effectively generate synthetic biomedical text of desired clinical context.
6+
**Expected Outcomes**:A generative RNN based on LSTMs that can effectively generate synthetic biomedical text of desired clinical context.
77

88
### Benchmark Specs
99

@@ -20,11 +20,13 @@
2020

2121
#### Evaluation Metrics
2222
* Accuracy or loss function: Standard information theoretic metrics such as log-likelihood score, minimum description length score, AIC/BIC to measure how similar actual documents are compared to generated ones
23-
* Expected performance of a naïve method: Latent Dirichlet allocation (LDA) models
23+
* Expected performance of a naïve method: Latent Dirichlet allocation (LDA) models
2424

2525
#### Description of the Network
26-
* Proposed network architecture: LSTM with at least 4 layers and [128, 256, 512] character windows
26+
* Proposed network architecture: LSTM with at least 1 layer with 256 character windows
2727
* Number of layers: At least two hidden layers with one input and one output sequence
28+
A graphical representation of the samme is shown here.
29+
![CB-RNN Architecture](https://raw.githubusercontent.com/ECP-CANDLE/Benchmarks/master/Pilot3/P3B2/images/RNN1.png)
2830

2931
#### Annotated Keras Code
3032
Data loader, preprocessing, basic training and cross validation, prediction and evaluation on test data
@@ -33,11 +35,11 @@ Data loader, preprocessing, basic training and cross validation, prediction and
3335
The data file provided here is a compressed pickle file (.tgz extension). Before running the code, use:
3436
```
3537
cd P3B2
36-
tar -xzf data.pkl.tgz
38+
tar -xzf data.pkl.tgz
3739
```
38-
to unpack the archive. Note that the training data is provided as a single pickle file. The code is documented to provide enough information about how to reproduce the files.
40+
to unpack the archive. Note that the training data is provided as a single pickle file. The code is documented to provide enough information about how to reproduce the files.
3941

40-
After uncompressing the data file, you can run:
42+
After uncompressing the data file, you can run:
4143
```
4244
python keras_p3b2_baseline.py
4345
```
@@ -46,15 +48,14 @@ The original data from the pathology reports cannot be made available online. He
4648

4749
### Example output
4850
#### Checkpointing and model saving
49-
At each iteration of the training process, a model is output as a h5 file and also as a json file. An example model (in JSON format) is shown below.
51+
At each iteration of the training process, a model is output as a h5 file and also as a json file. An example model (in JSON format) is shown below.
5052
```
5153
{"class_name": "Sequential", "keras_version": "1.1.0", "config": [{"class_name": "LSTM", "config": {"inner_activation": "hard_sigmoid", "trainable": true, "inner_init": "orthogonal", "output_dim": 256, "unroll": false, "consume_less": "cpu", "init": "glorot_uniform", "dropout_U": 0.0, "input_dtype": "float32", "batch_input_shape": [null, 20, 99], "input_length": null, "dropout_W": 0.0, "activation": "tanh", "stateful": false, "b_regularizer": null, "U_regularizer": null, "name": "lstm_1", "go_backwards": false, "input_dim": 99, "return_sequences": false, "W_regularizer": null, "forget_bias_init": "one"}}, {"class_name": "Dense", "config": {"W_constraint": null, "b_constraint": null, "name": "dense_1", "activity_regularizer": null, "trainable": true, "init": "glorot_uniform", "bias": true, "input_dim": null, "b_regularizer": null, "W_regularizer": null, "activation": "linear", "output_dim": 99}}, {"class_name": "Activation", "config": {"activation": "softmax", "trainable": true, "name": "activation_1"}}]}
5254
```
5355

5456
#### Sample text generated
55-
The model generates text files that are stored as ```example_<epoch>_<text-number>.txt``` within a separate folder. An example output may look like this:
57+
The model generates text files that are stored as ```example_<epoch>_<text-number>.txt``` within a separate folder. An example output may look like this:
5658
```
5759
----- Generating with seed: "Diagnosis"
5860
DiagnosisWZing Pathology Laboratory is certified under this report. **NAME[M. SSS dessDing Adientation of the tissue is submitted in the same container labeled with the patient's name and designated 'subcarinal lymph node is submitted in toto in cassette A1. B. Received in formalin labeled "right lower outer quadrant; A11-A10 - slice 16 with a cell block and submitted in cassette A1. B. Received fresh for
59-
```
60-
61+
```

Pilot3/P3B2/images/RNN1.png

51.2 KB
Loading

README.setup.linux

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,9 @@ conda install -c conda-forge tensorflow
3030
conda install matplotlib
3131
conda install PIL
3232
conda install tqdm
33+
conda install scikit-learn
34+
conda install mkl-service
35+
3336
source deactivate keras1
3437

3538
# Download the source files for the tutorial

0 commit comments

Comments
 (0)