Merge branch 'master' of github.com:ECP-CANDLE/Benchmarks

bvanessen · bvanessen · commit 8f7e02d72c92 · 2017-03-31T08:36:11.000-07:00
diff --git a/Pilot3/P3B1/README.md b/Pilot3/P3B1/README.md
@@ -29,7 +29,7 @@
 * Number of layers: 5-6 layers
 
 A graphical representation of the MTL-DNN is shown below:
-![MTL-DNN Architecture](https://raw.githubusercontent.com/ECP-CANDLE/Benchmarks/master/P3B1/images/MTL1.png)
+![MTL-DNN Architecture](https://raw.githubusercontent.com/ECP-CANDLE/Benchmarks/master/Pilot3/P3B1/images/MTL1.png)
 
 ### Running the baseline implementation
 There are two broad options for running our MTL implementation. The first baseline option includes the basic training of an MTL-based deep neural net. The second implementation includes a standard 10-fold cross-validation loop and depends on the first baseline for building and training the MTL-based deep neural net.
diff --git a/Pilot3/P3B2/README.md b/Pilot3/P3B2/README.md
@@ -1,9 +1,9 @@
 ## P3B1: RNN-LSTM: A Generative Model for Clinical Path Reports
 **Overview**:Given a sample corpus of biomedical text such as clinical reports, build a deep learning network that can automatically generate synthetic text documents with valid clinical context.
 
-**Relationship to core problem**:Labeled data is quite challenging to come by, specifically for patient data, since manual annotations are time consuming; hence, a core capability we intend to build is a “gold-standard” annotated data that is generated by deep learning networks to tune our deep text comprehension applications. 
+**Relationship to core problem**:Labeled data is quite challenging to come by, specifically for patient data, since manual annotations are time consuming; hence, a core capability we intend to build is a “gold-standard” annotated data that is generated by deep learning networks to tune our deep text comprehension applications.
 
-**Expected Outcomes**:A generative RNN based on LSTMs that can effectively generate synthetic biomedical text of desired clinical context. 
+**Expected Outcomes**:A generative RNN based on LSTMs that can effectively generate synthetic biomedical text of desired clinical context.
 
 ### Benchmark Specs
 
@@ -20,11 +20,13 @@
 
 #### Evaluation Metrics
 * Accuracy or loss function: Standard information theoretic metrics such as log-likelihood score, minimum description length score, AIC/BIC to measure how similar actual documents are compared to generated ones
-* Expected performance of a naïve method: Latent Dirichlet allocation (LDA) models 
+* Expected performance of a naïve method: Latent Dirichlet allocation (LDA) models
 
 #### Description of the Network
-* Proposed network architecture: LSTM with at least 4 layers and [128, 256, 512] character windows
+* Proposed network architecture: LSTM with at least 1 layer with 256 character windows
 * Number of layers: At least two hidden layers with one input and one output sequence
+A graphical representation of the samme is shown here.
+![CB-RNN Architecture](https://raw.githubusercontent.com/ECP-CANDLE/Benchmarks/master/Pilot3/P3B2/images/RNN1.png)
 
 #### Annotated Keras Code
 Data loader, preprocessing, basic training and cross validation, prediction and evaluation on test data  
@@ -33,11 +35,11 @@ Data loader, preprocessing, basic training and cross validation, prediction and
 The data file provided here is a compressed pickle file (.tgz extension). Before running the code, use:
 ```
 cd P3B2
-tar -xzf data.pkl.tgz 
+tar -xzf data.pkl.tgz
 ```
-to unpack the archive. Note that the training data is provided as a single pickle file. The code is documented to provide enough information about how to reproduce the files. 
+to unpack the archive. Note that the training data is provided as a single pickle file. The code is documented to provide enough information about how to reproduce the files.
 
-After uncompressing the data file, you can run: 
+After uncompressing the data file, you can run:
 ```
 python keras_p3b2_baseline.py
 ```
@@ -46,15 +48,14 @@ The original data from the pathology reports cannot be made available online. He
 
 ### Example output
 #### Checkpointing and model saving
-At each iteration of the training process, a model is output as a h5 file and also as a json file. An example model (in JSON format) is shown below. 
+At each iteration of the training process, a model is output as a h5 file and also as a json file. An example model (in JSON format) is shown below.
 ```
 {"class_name": "Sequential", "keras_version": "1.1.0", "config": [{"class_name": "LSTM", "config": {"inner_activation": "hard_sigmoid", "trainable": true, "inner_init": "orthogonal", "output_dim": 256, "unroll": false, "consume_less": "cpu", "init": "glorot_uniform", "dropout_U": 0.0, "input_dtype": "float32", "batch_input_shape": [null, 20, 99], "input_length": null, "dropout_W": 0.0, "activation": "tanh", "stateful": false, "b_regularizer": null, "U_regularizer": null, "name": "lstm_1", "go_backwards": false, "input_dim": 99, "return_sequences": false, "W_regularizer": null, "forget_bias_init": "one"}}, {"class_name": "Dense", "config": {"W_constraint": null, "b_constraint": null, "name": "dense_1", "activity_regularizer": null, "trainable": true, "init": "glorot_uniform", "bias": true, "input_dim": null, "b_regularizer": null, "W_regularizer": null, "activation": "linear", "output_dim": 99}}, {"class_name": "Activation", "config": {"activation": "softmax", "trainable": true, "name": "activation_1"}}]}
 ```
 
 #### Sample text generated
-The model generates text files that are stored as ```example_<epoch>_<text-number>.txt``` within a separate folder. An example output may look like this: 
+The model generates text files that are stored as ```example_<epoch>_<text-number>.txt``` within a separate folder. An example output may look like this:
 ```
 ----- Generating with seed: "Diagnosis"
                     DiagnosisWZing Pathology Laboratory is certified under this report. **NAME[M. SSS dessDing Adientation of the tissue is submitted in the same container labeled with the patient's name and designated 'subcarinal lymph node is submitted in toto in cassette A1. B. Received in formalin labeled "right lower outer quadrant; A11-A10 - slice 16 with a cell block and submitted in cassette A1. B. Received fresh for
-``` 
-
+```
diff --git a/Pilot3/P3B2/images/RNN1.png b/Pilot3/P3B2/images/RNN1.png
diff --git a/README.setup.linux b/README.setup.linux
@@ -30,6 +30,9 @@ conda install -c conda-forge tensorflow
 conda install matplotlib
 conda install PIL
 conda install tqdm
+conda install scikit-learn
+conda install mkl-service
+
 source deactivate keras1
 
 # Download the source files for the tutorial