|
| 1 | +# GOLD and Multiprocessing |
| 2 | + |
| 3 | +## Introduction |
| 4 | + |
| 5 | +This repo contains a script, `gold_multi.py`, which is designed to illustrate how to use the [CSD Docking API](https://downloads.ccdc.cam.ac.uk/documentation/API/descriptive_docs/docking.html) and the standard Python [multiprocessing](https://docs.python.org/3.7/library/multiprocessing.html) module to parallelize GOLD docking. Also included is a simple example system to demonstrate the operation of the script. |
| 6 | + |
| 7 | +On a multi-core workstation, this approach should be suitable for docking some hundreds or thousands of ligands depending on the rigour of the docking protocol used; please consult the GOLD USer Guide for information about speed/accuracy tradeoffs in GOLD. Note that the script is not useful for running GOLD on an HPC compute cluster or on the Cloud: the CCDC provides the GOLD Cluster and GOLD Cloud tools for those use-cases. For further details, please contact [[email protected]](mailto:[email protected]). |
| 8 | + |
| 9 | +As ever when using multiprocessing techniques, increasing the number processes will at some point begin to degrade performance as available cores are saturated. At what point this happens will depend on the machine and the workload and thus can only really be determined by experimentation. A default of six was selected as the script was developed on an eight-core workstation and this seemed to give decent performance while leaving cores for other processes. |
| 10 | + |
| 11 | +The script is designed to be as simple as possible in order to not obscure the mechanisms of parallelization. Thus, for example, configuration of the docking is taken entirely from the GOLD conf file. There is also the limitation that only a single input file of ligands is accepted. In addition, the implementation of error-handling and logging is rather lightweight. If a proper application was required then these matters could be addressed. |
| 12 | + |
| 13 | +The script writes output to the directory specified in the GOLD configuration file, and the results can be inspected by loading the GOLD conf file in Hermes as normal (see the Hermes User Guide for details). A `bestranking.lst` file is also written, which records the best-scoring pose for each molecule. Other output normally written by GOLD is not created, although this could be implemented if necessary. |
| 14 | + |
| 15 | +The script partitions the input ligand file into chunks and uses the Docking API and multiprocessing to dock these chunks in parallel using named subdirectories for their output. The solution files for the chunks are then copied to the main output directory and the full `bestranking.lst` file compiled from the partial chunk versions. The intermediate subdirectories are currently kept, but the script could easily be modified to delete them or use anonymous temporary directories if disk usage was to be an issue. |
| 16 | + |
| 17 | +--- |
| 18 | +## Requirements |
| 19 | + |
| 20 | +- [GOLD](https://www.ccdc.cam.ac.uk/solutions/csd-discovery/components/gold/) and the [CSD Python API](https://downloads.ccdc.cam.ac.uk/documentation/API/) installed. |
| 21 | +- Configuration File: `gold.conf` |
| 22 | + |
| 23 | +## Licensing Requirements |
| 24 | + |
| 25 | +CSD-Discovery, CSD-Enterprise and Research Partner suites would all be sufficient. |
| 26 | + |
| 27 | +## Instructions on Running |
| 28 | + |
| 29 | +To run the script, an environment with the CCDC Python API installed must be active. Further information is available in |
| 30 | +the [API installation notes](https://downloads.ccdc.cam.ac.uk/documentation/API/installation_notes.html). |
| 31 | + |
| 32 | +The script is designed to be run from the command line only (and not, for example, from within Hermes). The path to a GOLD configuration file may be provided as a command argument; if no argument is provided, it is assumed there will be a file `gold.conf` in the current working directory. |
| 33 | + |
| 34 | +On Windows, the command would be (in the folder where this archive was unzipped)... |
| 35 | + |
| 36 | +``` |
| 37 | +> python.exe .\gold_multi.py |
| 38 | +``` |
| 39 | + |
| 40 | +On Linux or MacOS, an equivalent would be (first making the script executable)... |
| 41 | + |
| 42 | +``` |
| 43 | +$ chmod u+x ./gold_multi.py |
| 44 | +
|
| 45 | +$ ./gold_multi.py |
| 46 | +``` |
| 47 | + |
| 48 | +In either case, add the option `--help` to show more information. |
| 49 | + |
| 50 | +```cmd |
| 51 | +usage: gold_multi.py [-h] [--n_processes N_PROCESSES] [conf_file] |
| 52 | +
|
| 53 | +positional arguments: |
| 54 | + conf_file GOLD configuration file (default='gold.conf') |
| 55 | +
|
| 56 | +optional arguments: |
| 57 | + -h, --help show this help message and exit |
| 58 | + --n_processes N_PROCESSES |
| 59 | + No. of processes (default=6) |
| 60 | +``` |
| 61 | + |
| 62 | +--- |
| 63 | +## Note on the input files provided |
| 64 | + |
| 65 | +The example target provided (see the directory `target/`) is SYK tyrosine kinase ([5LMA](https://www.ebi.ac.uk/pdbe/entry/pdb/5lma)). |
| 66 | + |
| 67 | +The ligands in `input.sdf` were built from SMILES. If the name is a PDB code, it means the SMILES corresponded to the crystallographic ligand from that structure (with conventional ionization states assigned). If the name has a suffix, the SMILES is a manually-generated analogue. Note that not all these ligands can be correctly cross-docked into 5LMA, as there are induced-fit effects in SYK that GOLD cannot reproduce. |
| 68 | + |
| 69 | +> For feedback or to report any issues please contact [[email protected]](mailto:[email protected]) |
0 commit comments