Skip to content

Commit 067cde8

Browse files
Merge pull request #5 from ccdc-opensource/gold_multi
Added gold_multi.
2 parents d154b68 + 1daeb26 commit 067cde8

File tree

8 files changed

+11373
-2
lines changed

8 files changed

+11373
-2
lines changed

scripts/ReadMe.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,10 @@ This folder contains scripts submitted by users or CCDC scientists for anyone to
99
- Performs a multi-component HBP calculation for a given library of co-formers.
1010

1111
### Packing similarity dendrogram:
12-
- Construct a dendrogram for an input set of structures based on packing-similarity
13-
analysis
12+
- Construct a dendrogram for an input set of structures based on packing-similarity analysis.
13+
14+
### GOLD-multi
15+
- Use the CSD Docking API and the multiprocessing module to parallelize GOLD docking.
1416

1517
## Tips
1618
### Searching tips:

scripts/gold_multi/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
!target

scripts/gold_multi/README.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# GOLD and Multiprocessing
2+
3+
## Introduction
4+
5+
This repo contains a script, `gold_multi.py`, which is designed to illustrate how to use the [CSD Docking API](https://downloads.ccdc.cam.ac.uk/documentation/API/descriptive_docs/docking.html) and the standard Python [multiprocessing](https://docs.python.org/3.7/library/multiprocessing.html) module to parallelize GOLD docking. Also included is a simple example system to demonstrate the operation of the script.
6+
7+
On a multi-core workstation, this approach should be suitable for docking some hundreds or thousands of ligands depending on the rigour of the docking protocol used; please consult the GOLD USer Guide for information about speed/accuracy tradeoffs in GOLD. Note that the script is not useful for running GOLD on an HPC compute cluster or on the Cloud: the CCDC provides the GOLD Cluster and GOLD Cloud tools for those use-cases. For further details, please contact [[email protected]](mailto:[email protected]).
8+
9+
As ever when using multiprocessing techniques, increasing the number processes will at some point begin to degrade performance as available cores are saturated. At what point this happens will depend on the machine and the workload and thus can only really be determined by experimentation. A default of six was selected as the script was developed on an eight-core workstation and this seemed to give decent performance while leaving cores for other processes.
10+
11+
The script is designed to be as simple as possible in order to not obscure the mechanisms of parallelization. Thus, for example, configuration of the docking is taken entirely from the GOLD conf file. There is also the limitation that only a single input file of ligands is accepted. In addition, the implementation of error-handling and logging is rather lightweight. If a proper application was required then these matters could be addressed.
12+
13+
The script writes output to the directory specified in the GOLD configuration file, and the results can be inspected by loading the GOLD conf file in Hermes as normal (see the Hermes User Guide for details). A `bestranking.lst` file is also written, which records the best-scoring pose for each molecule. Other output normally written by GOLD is not created, although this could be implemented if necessary.
14+
15+
The script partitions the input ligand file into chunks and uses the Docking API and multiprocessing to dock these chunks in parallel using named subdirectories for their output. The solution files for the chunks are then copied to the main output directory and the full `bestranking.lst` file compiled from the partial chunk versions. The intermediate subdirectories are currently kept, but the script could easily be modified to delete them or use anonymous temporary directories if disk usage was to be an issue.
16+
17+
---
18+
## Requirements
19+
20+
- [GOLD](https://www.ccdc.cam.ac.uk/solutions/csd-discovery/components/gold/) and the [CSD Python API](https://downloads.ccdc.cam.ac.uk/documentation/API/) installed.
21+
- Configuration File: `gold.conf`
22+
23+
## Licensing Requirements
24+
25+
CSD-Discovery, CSD-Enterprise and Research Partner suites would all be sufficient.
26+
27+
## Instructions on Running
28+
29+
To run the script, an environment with the CCDC Python API installed must be active. Further information is available in
30+
the [API installation notes](https://downloads.ccdc.cam.ac.uk/documentation/API/installation_notes.html).
31+
32+
The script is designed to be run from the command line only (and not, for example, from within Hermes). The path to a GOLD configuration file may be provided as a command argument; if no argument is provided, it is assumed there will be a file `gold.conf` in the current working directory.
33+
34+
On Windows, the command would be (in the folder where this archive was unzipped)...
35+
36+
```
37+
> python.exe .\gold_multi.py
38+
```
39+
40+
On Linux or MacOS, an equivalent would be (first making the script executable)...
41+
42+
```
43+
$ chmod u+x ./gold_multi.py
44+
45+
$ ./gold_multi.py
46+
```
47+
48+
In either case, add the option `--help` to show more information.
49+
50+
```cmd
51+
usage: gold_multi.py [-h] [--n_processes N_PROCESSES] [conf_file]
52+
53+
positional arguments:
54+
conf_file GOLD configuration file (default='gold.conf')
55+
56+
optional arguments:
57+
-h, --help show this help message and exit
58+
--n_processes N_PROCESSES
59+
No. of processes (default=6)
60+
```
61+
62+
---
63+
## Note on the input files provided
64+
65+
The example target provided (see the directory `target/`) is SYK tyrosine kinase ([5LMA](https://www.ebi.ac.uk/pdbe/entry/pdb/5lma)).
66+
67+
The ligands in `input.sdf` were built from SMILES. If the name is a PDB code, it means the SMILES corresponded to the crystallographic ligand from that structure (with conventional ionization states assigned). If the name has a suffix, the SMILES is a manually-generated analogue. Note that not all these ligands can be correctly cross-docked into 5LMA, as there are induced-fit effects in SYK that GOLD cannot reproduce.
68+
69+
> For feedback or to report any issues please contact [[email protected]](mailto:[email protected])

scripts/gold_multi/gold.conf

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
GOLD CONFIGURATION FILE
2+
3+
AUTOMATIC SETTINGS
4+
autoscale = 0.3
5+
6+
POPULATION
7+
popsiz = auto
8+
select_pressure = auto
9+
n_islands = auto
10+
maxops = auto
11+
niche_siz = auto
12+
13+
GENETIC OPERATORS
14+
pt_crosswt = auto
15+
allele_mutatewt = auto
16+
migratewt = auto
17+
18+
FLOOD FILL
19+
radius = 6
20+
origin = 0 0 0
21+
do_cavity = 1
22+
floodfill_atom_no = 0
23+
cavity_file = target/ligand.mol2
24+
floodfill_center = cavity_from_ligand
25+
26+
DATA FILES
27+
ligand_data_file input.sdf 5
28+
param_file = DEFAULT
29+
set_ligand_atom_types = 1
30+
set_protein_atom_types = 0
31+
directory = output
32+
tordist_file = DEFAULT
33+
make_subdirs = 0
34+
save_lone_pairs = 0
35+
fit_points_file = fit_pts.mol2
36+
read_fitpts = 0
37+
38+
FLAGS
39+
internal_ligand_h_bonds = 0
40+
flip_free_corners = 0
41+
match_ring_templates = 0
42+
flip_amide_bonds = 0
43+
flip_planar_n = 1 flip_ring_NRR flip_ring_NHR
44+
flip_pyramidal_n = 0
45+
rotate_carboxylic_oh = flip
46+
use_tordist = 1
47+
postprocess_bonds = 1
48+
rotatable_bond_override_file = DEFAULT
49+
solvate_all = 1
50+
51+
TERMINATION
52+
early_termination = 1
53+
n_top_solutions = 3
54+
rms_tolerance = 1.5
55+
56+
CONSTRAINTS
57+
force_constraints = 0
58+
59+
COVALENT BONDING
60+
covalent = 0
61+
62+
SAVE OPTIONS
63+
save_score_in_file = 1 comments
64+
save_protein_torsions = 1
65+
output_file_format = MACCS
66+
67+
WRITE OPTIONS
68+
write_options = NO_LINK_FILES NO_RNK_FILES NO_GOLD_LIGAND_MOL2_FILE
69+
70+
FITNESS FUNCTION SETTINGS
71+
initial_virtual_pt_match_max = 3
72+
relative_ligand_energy = 1
73+
gold_fitfunc_path = plp
74+
score_param_file = DEFAULT
75+
76+
PROTEIN DATA
77+
protein_datafile = target/protein.mol2
78+
79+
CONSTRAINTS
80+
constraint protein_h_bond 10.0000 0.005000 3696
81+
82+

0 commit comments

Comments
 (0)