You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are two broad options for running our MTL implementation. The first baseline option includes the basic training of an MTL-based deep neural net. The second implementation includes a standard 10-fold cross-validation loop and depends on the first baseline for building and training the MTL-based deep neural net.
## P3B1: RNN-LSTM: A Generative Model for Clinical Path Reports
2
2
**Overview**:Given a sample corpus of biomedical text such as clinical reports, build a deep learning network that can automatically generate synthetic text documents with valid clinical context.
3
3
4
-
**Relationship to core problem**:Labeled data is quite challenging to come by, specifically for patient data, since manual annotations are time consuming; hence, a core capability we intend to build is a “gold-standard” annotated data that is generated by deep learning networks to tune our deep text comprehension applications.
4
+
**Relationship to core problem**:Labeled data is quite challenging to come by, specifically for patient data, since manual annotations are time consuming; hence, a core capability we intend to build is a “gold-standard” annotated data that is generated by deep learning networks to tune our deep text comprehension applications.
5
5
6
-
**Expected Outcomes**:A generative RNN based on LSTMs that can effectively generate synthetic biomedical text of desired clinical context.
6
+
**Expected Outcomes**:A generative RNN based on LSTMs that can effectively generate synthetic biomedical text of desired clinical context.
7
7
8
8
### Benchmark Specs
9
9
@@ -20,11 +20,13 @@
20
20
21
21
#### Evaluation Metrics
22
22
* Accuracy or loss function: Standard information theoretic metrics such as log-likelihood score, minimum description length score, AIC/BIC to measure how similar actual documents are compared to generated ones
23
-
* Expected performance of a naïve method: Latent Dirichlet allocation (LDA) models
23
+
* Expected performance of a naïve method: Latent Dirichlet allocation (LDA) models
24
24
25
25
#### Description of the Network
26
-
* Proposed network architecture: LSTM with at least 4 layers and [128, 256, 512] character windows
26
+
* Proposed network architecture: LSTM with at least 1 layer with 256 character windows
27
27
* Number of layers: At least two hidden layers with one input and one output sequence
28
+
A graphical representation of the samme is shown here.
Data loader, preprocessing, basic training and cross validation, prediction and evaluation on test data
@@ -33,11 +35,11 @@ Data loader, preprocessing, basic training and cross validation, prediction and
33
35
The data file provided here is a compressed pickle file (.tgz extension). Before running the code, use:
34
36
```
35
37
cd P3B2
36
-
tar -xzf data.pkl.tgz
38
+
tar -xzf data.pkl.tgz
37
39
```
38
-
to unpack the archive. Note that the training data is provided as a single pickle file. The code is documented to provide enough information about how to reproduce the files.
40
+
to unpack the archive. Note that the training data is provided as a single pickle file. The code is documented to provide enough information about how to reproduce the files.
39
41
40
-
After uncompressing the data file, you can run:
42
+
After uncompressing the data file, you can run:
41
43
```
42
44
python keras_p3b2_baseline.py
43
45
```
@@ -46,15 +48,14 @@ The original data from the pathology reports cannot be made available online. He
46
48
47
49
### Example output
48
50
#### Checkpointing and model saving
49
-
At each iteration of the training process, a model is output as a h5 file and also as a json file. An example model (in JSON format) is shown below.
51
+
At each iteration of the training process, a model is output as a h5 file and also as a json file. An example model (in JSON format) is shown below.
The model generates text files that are stored as ```example_<epoch>_<text-number>.txt``` within a separate folder. An example output may look like this:
57
+
The model generates text files that are stored as ```example_<epoch>_<text-number>.txt``` within a separate folder. An example output may look like this:
56
58
```
57
59
----- Generating with seed: "Diagnosis"
58
60
DiagnosisWZing Pathology Laboratory is certified under this report. **NAME[M. SSS dessDing Adientation of the tissue is submitted in the same container labeled with the patient's name and designated 'subcarinal lymph node is submitted in toto in cassette A1. B. Received in formalin labeled "right lower outer quadrant; A11-A10 - slice 16 with a cell block and submitted in cassette A1. B. Received fresh for
0 commit comments