Skip to content

Commit df739f6

Browse files
committed
Add processing results to README
1 parent 6ce2ca3 commit df739f6

File tree

2 files changed

+53
-5
lines changed

2 files changed

+53
-5
lines changed

README.md

Lines changed: 53 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ This repository facilitates high-throughput Rosetta runs to compute energy terms
1616
- [Generating variants using the subvariants algorithm](#generating-variants-using-the-subvariants-algorithm)
1717
+ [Prepare an HTCondor run](#prepare-an-htcondor-run)
1818
+ [Processing results](#processing-results)
19+
- [Parse the results files](#parse-the-results-files)
20+
- [Add results to a database](#add-results-to-a-database)
1921

2022

2123
## Setup
@@ -279,10 +281,60 @@ python code/condor.py @htcondor/run_defs/gb1_example_run.txt
279281
```
280282

281283
The script will generate a run directory and place it in `output/htcondor_runs`.
282-
From there, you can upload the run directory to your CHTC submit node.
284+
From there, you can upload the run directory to your HTCondor submit node.
283285
You can then submit the run using `submit.sh`, which should be located in the run directory.
284286

285287
### Processing results
286288

289+
The HTCondor run will produce a log directory for each job.
290+
The log directory contains 4 files: `args.txt`, `job.csv`, `hparams.csv`, and `energies.csv`.
287291

292+
| File Name | Description |
293+
|---------------|----------------------------------------------------------------------------------------------------------------------|
294+
| `args.txt` | Contains the arguments fed into the `energize.py` script to produce the output. |
295+
| `job.csv` | Contains information about the HTCondor job that produced the output, such as the cluster, hostname, and start time. |
296+
| `hparams.csv` | Contains the Rosetta hyperparameters used to compute the energies. |
297+
| `energies.csv`| Contains the computed energies for each variant in the job. |
288298

299+
After the HTCondor run, transfer these log directories to your local machine for processing.
300+
I recommend compressing them before the file transfer:
301+
```commandline
302+
tar -czf output.tar.gz output
303+
```
304+
Then untar the file on your local machine:
305+
```commandline
306+
tar -xf output.tar.gz
307+
```
308+
309+
Make sure to extract the output into the same condor run directory that you produced with `condor.py` in the [Prepare an HTCondor run](#prepare-an-htcondor-run) section.
310+
311+
#### Parse the results files
312+
313+
The [process_run.py](code/process_run.py) script can be used to parse the results files and combine them into a single dataframe.
314+
315+
Run it with the following command, specifying the mode `stats` and the main run directory of the HTCondor run:
316+
```commandline
317+
python code/process_run.py stats --main_run_dirs output/htcondor_runs/my_condor_run
318+
```
319+
320+
This will produce a directory named `processed_run` inside in the main run directory.
321+
The `processed_run` directory contains a number of plots and dataframes which should be self-explanatory.
322+
The main dataframe is `processed_run/energies_df.csv` and contains the computed energies for each variant in the run.
323+
324+
You can specify multiple run directories to process at a time by separating them with a space when calling [process_run.py](code/process_run.py).
325+
326+
#### Add results to a database
327+
328+
You can add the results of one or more HTCondor runs to an SQLite database using the [process_run.py](code/process_run.py) script.
329+
330+
First, create the blank database using [database.py](code/database.py).
331+
```commandline
332+
python code/database.py create --db_fn variant_database/my_database.db
333+
```
334+
335+
Then, add the results of one or more HTCondor runs to the database using the `database` mode of [process_run.py](code/process_run.py).
336+
```commandline
337+
python code/process_run.py database --main_run_dirs output/htcondor_runs/my_condor_run --db_fn variant_database/my_database.db
338+
```
339+
340+
This database can now be used with the [metl](https://github.com/gitter-lab/metl) repository to create a processed Rosetta dataset and pretrain METL models.

code/database.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -163,8 +163,4 @@ def main(args):
163163
type=str,
164164
default="variant_database/database.db")
165165

166-
parser.add_argument("--condor_main_dir",
167-
type=str,
168-
help="path to the parent condor directory for run to process")
169-
170166
main(parser.parse_args())

0 commit comments

Comments
 (0)