Skip to content

Commit a2563e8

Browse files
committed
Updated the preliminary documentation for Pilot 2 benchmark 2.
1 parent 75679ab commit a2563e8

File tree

1 file changed

+39
-0
lines changed

1 file changed

+39
-0
lines changed

P2B2/README.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
## P2B2: Autoencoder Compressed Representation for Molecular Dynamics Simulation Data
2+
3+
**Overview**: Cut down on manual inspection time for molecular simulation data
4+
5+
**Relationship to core problem**: Identify possible forming of cancer-causing proteins by molecular simulation
6+
7+
**Expected outcome**: Improvement in the understanding of protein formation and easing of the handling large-scale molecular dynamics output
8+
9+
### Benchmark Specs Requirements
10+
11+
#### Description of the Data
12+
* Data source: MD Simulation output as PDB files (coarse-grained bead simulation)
13+
* Input dimensions: ~1.26e6 per time step (6000 lipids x 30 beads per lipid x (position + velocity + type))
14+
* Output dimensions: 1xN_Frame (N=100 hidden units)
15+
* Latent representation dimension:
16+
* Sample size: O(10^6) for simulation requiring O(10^8) time steps
17+
* Notes on data balance and other issues: unlabeled data with rare events
18+
19+
#### Expected Outcomes
20+
* 'Telescope' into data: Find regions of interest based on higher level of structure than rest of regions
21+
* Output range: Dimension: 1 scalar value corresponding to each frame of simulation representing the structured-ness of the data compared to the mean. Output range: [0, 100] 0=mean noise level, 100=very structured.
22+
23+
#### Evaluation Metrics
24+
* Accuracy or loss function: Domain experts agreeing on utility
25+
* Expected performance of a naive method: Comparison of different technical approaches against each other and against labels (see above)
26+
27+
#### Description of the Network
28+
* Proposed network architecture: stacked fully-connected autoencoder feeding RNN
29+
* Number of layers: 4
30+
31+
### Running the baseline implementation
32+
33+
Using virtualenv
34+
35+
```
36+
cd P2B2
37+
workon keras
38+
python __main__.py --home-dir=${HOME}/.virtualenvs/keras/lib/python2.7/site-packages --look-back 15 --train --epochs 20 --learning-rate 0.01 --cool --seed --batch-size 10 --seed
39+
```

0 commit comments

Comments
 (0)