Skip to content

Commit f732cdb

Browse files
committed
update readme
1 parent a5f0155 commit f732cdb

File tree

1 file changed

+50
-39
lines changed

1 file changed

+50
-39
lines changed

README.md

Lines changed: 50 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,13 @@
22
[![Generic badge](https://img.shields.io/badge/version-1.0.0-blue.svg)](https://shields.io/)
33
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7023191.svg)](https://doi.org/10.5281/zenodo.7023191)
44

5-
![POEM ](./docs/_img/poem_toc_jm2c00931_0011.jpg)
5+
![POEM](./docs/_img/poem_toc_jm2c00931_0011.jpg)
66

77

88

99
## Description
1010
This project aims at using information from *ligand pieces* bound to protein subpockets to
11-
automatically build new molecules tailored to a particular target pocket
11+
(semi-) automatically build new molecules tailored to a particular target pocket
1212
on the basis of the subpockets/target pocket estimated similarities.
1313
This workflow was tested to design new sub-micromolar hit candidates for CDK8 inhibition:
1414

@@ -21,19 +21,19 @@ Please note that the publication refers to release v1.0.0.
2121

2222
## Content
2323

24-
`envs/` --> conda environments <br>
25-
`cdk8_structures/` --> target structures <br>
26-
`aligned_fragments.tgz` downloadable at [10.5281/zenodo.7023191](https://zenodo.org/record/7023191) --> output data after steps 1-3, input for step 4+ <br>
27-
`scripts/` --> scripts to for library generation <br>
28-
`output_files.tgz` downloadable at [10.5281/zenodo.7023191](https://zenodo.org/record/7023191) --> data obtained at each step, and depedencies <br>
24+
- `envs/` ---> conda environments
25+
- `cdk8_structures/` ---> target structures
26+
- `scripts/` ---> scripts for library generation
27+
- `aligned_fragments.tgz` downloadable at [10.5281/zenodo.7023191](https://zenodo.org/record/7023191) ---> output data after steps 1-3, input for step 4+
28+
- `output_files.tgz` downloadable at [10.5281/zenodo.7023191](https://zenodo.org/record/7023191) ---> data obtained at each step, and depedencies
2929

3030

3131
## Requirements
3232
- Conda: https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html environements with python 3.6+.
3333
- IChem: http://bioinfo-pharma.u-strasbg.fr/labwebsite/download.html
3434
- ProCare: https://github.com/kimeguida/ProCare
3535
- DeLinker: https://github.com/oxpig/DeLinker
36-
36+
- RDKit: http://www.rdkit.org, https://github.com/rdkit/rdkit
3737

3838
## Part 1/ Subpocket screening and fragments preparation
3939

@@ -54,9 +54,10 @@ ProCare: [https://github.com/kimeguida/ProCare](https://github.com/kimeguida/Pro
5454
`cd aligned_fragments/` <br>
5555
`python ../scripts/procare_launcher.py -s <subpocket> -t ../cdk8_structures/5hbh_cavityALL_p0-p1-p6.mol2 --transform --ligandtransform <fragment>` <br>
5656

57-
Outputs: aligned subpockets, fragments and `procare_scores.tsv` available on [zenodo](https://zenodo.org/record/7023191) <br>
58-
subpocket: cfh_xx_fragN_cavity4.mol2 <br>
59-
fragment: cfh_xx_fragN.mol2 <br>
57+
Outputs:
58+
- aligned subpockets, fragments and `procare_scores.tsv` available on [zenodo](https://zenodo.org/record/7023191)
59+
- subpocket: cfh_xx_fragN_cavity4.mol2
60+
- fragment: cfh_xx_fragN.mol2
6061

6162

6263
### 2. Convert aligned fragments from mol2 to sdf
@@ -65,19 +66,20 @@ We used OpenEye python toolkits (state of our conda env: envs/oepython.yml): <br
6566
`conda activate oepython` <br>
6667
`../scripts/convert.py <fragment>.mol2 <fragment>.sdf` <br>
6768

68-
Outputs: sdf available on [zenodo](https://zenodo.org/record/7023191) <br>
69-
fragment: cfh_xx_fragN.sdf <br>
69+
Outputs:
70+
- sdf available on [zenodo](https://zenodo.org/record/7023191)
71+
- fragment: cfh_xx_fragN.sdf
7072

7173

7274
### 3. Compute IChem interactions:
7375
`<path_to_your_ichem>/IChem ../cdk8_structures/5hbh_protein.mol2 <fragment>.mol2 > <fragment>.ifp` <br>
7476

75-
Outputs: sdf available on [zenodo](https://zenodo.org/record/7023191) <br>
76-
interactions file: cfh_xx_fragN.ifp <br>
77-
78-
77+
Outputs:
78+
- sdf available on [zenodo](https://zenodo.org/record/7023191)
79+
- interactions file: cfh_xx_fragN.ifp
7980
<br>
8081

82+
8183
## Part 2/ Selection and annotation of relevant fragments
8284
To reproduce this step, the required data out of steps 1-to-3 were made availaible at https://zenodo.org/record/7023191 as `aligned_fragments.tgz` and ouputs obtained `output_files.tgz` <br>
8385

@@ -95,12 +97,12 @@ assignment of CDK8 areas <br>
9597
`python ../scripts/select_fragments_round1.py -f ../output_files/procare_scores.tsv -d . -p ../cdk8_structures/5hbh_protein.mol2 -c ../cdk8_structures/5hbh_cavityALL_p0-p1-p6.mol2` <br>
9698

9799
Outputs: <br>
98-
`subpocket_p0_gate.list` which corresponds to GA1 <br>
99-
`subpocket_p0_hinge.list` --> H <br>
100-
`subpocket_p0_solv_1.list` --> SE2 <br>
101-
`subpocket_p0_solv_2.list` --> SE1 <br>
102-
`subpocket_p6_alphaC.list` --> AC <br>
103-
`subpocket_p6_lys52.list` --> GA2 <br>
100+
- `subpocket_p0_gate.list` which corresponds to GA1 in paper <br>
101+
- `subpocket_p0_hinge.list` ---> H <br>
102+
- `subpocket_p0_solv_1.list` ---> SE2 <br>
103+
- `subpocket_p0_solv_2.list` ---> SE1 <br>
104+
- `subpocket_p6_alphaC.list` ---> AC <br>
105+
- `subpocket_p6_lys52.list` ---> GA2 <br>
104106
available in `<this_repo>/output_files/` <br>
105107
<br>
106108

@@ -113,8 +115,8 @@ To reproduce these steps, the required data were made availaible at https://zeno
113115
### 5. Enumerate candidates for linking: pairs of fragments and atoms
114116
`python ../scripts/linkable_fragments_round1_job.py --hinge subpocket_p0_hinge.list --gate subpocket_p0_gate.list --solv1 subpocket_p0_solv_1.list --solv2 subpocket_p0_solv_2.list --alphac subpocket_p6_alphaC.list --lys52 subpocket_p6_lys52.list` <br>
115117

116-
Outputs: <br>
117-
`linkable_fragments_round1_<N>.list` with N in {0, 1, 2, 3, 4, 5, 6} available in `<this_repo>/output_files/` <br>
118+
Outputs:
119+
- `linkable_fragments_round1_<N>.list` with N in {0, 1, 2, 3, 4, 5, 6} available in `<this_repo>/output_files/` <br>
118120

119121
### 6. Linking with DeLinker
120122
DeLinker: https://github.com/oxpig/DeLinker <br>
@@ -130,64 +132,71 @@ Sometimes, no linker is generated and DeLinker might return truncated attempts.<
130132
`python ../scripts/get_linker.py --file generation_<N>.smi.gz --fragsdir . --pathdelinker <your_path_to_DeLinker>/DeLinker/` <br>
131133

132134
Outputs: <br>
133-
`generation_complete.smi` <br>
134-
`generation_uncomplete.smi` <br>
135+
- `generation_complete.smi` <br>
136+
- `generation_uncomplete.smi` <br>
135137

136138

137139
### 8. Name molecules and filter with openEye Filter
138140
SMILES were assigned IDs to keep track of the molecules infos. Filter will protonate and generate canonical SMILES different from RDKit's. <br>
139141
`python ../scripts/index_generated_molecules.py -i generation_complete.smi -o generation_complete_indexed.smi` <br>
140142

141-
Output: `generation_complete_indexed.smi` <br>
143+
Output:
144+
- `generation_complete_indexed.smi` <br>
142145

143146
`<path_to_openeye>/filter -in generation_complete_indexed.smi -out druglike_molecules.smi -fail druglike_failed_molecules.smi -filter ../output_files/filter_labo_cdk8.txt` <br>
144147

145148
Check that other annotations in the file did not affect how Filter processed the SMILES. In our case, we extracted the SMILES and indexes to a separate file `molecules.smi`. <br>
146149

147-
Output: `druglike_molecules.smi` <br>
150+
Output:
151+
- `druglike_molecules.smi` <br>
148152

149153
### 9. Synthetic accessibility, descriptors, filtering:
150154
SAscore from [https://github.com/rdkit/rdkit/blob/master/Contrib/SA_Score/sascorer.py](https://github.com/rdkit/rdkit/blob/master/Contrib/SA_Score/sascorer.py) <br>
151155
`python ../scripts/get_sascore.py -i druglike_molecules.smi -o druglike_molecules_sascore.tsv` <br>
152-
Output: `druglike_molecules_sascore.tsv` <br>
156+
Output:
157+
- `druglike_molecules_sascore.tsv` <br>
153158

154159
RDKit descriptors <br>
155160
`python ../scripts/get_druglike_descriptors.py -i druglike_molecules.smi -o druglike_molecules_descriptors.tsv` <br>
156161
`python ../scripts/get_linker_descriptors.py -i generation_complete_indexed.smi -o generation_linker_descriptors.tsv` <br>
157162

158163
Outputs: <br>
159-
`druglike_molecules_descriptors.tsv` <br>
160-
`generation_linker_descriptors.tsv` <br>
164+
- `druglike_molecules_descriptors.tsv`
165+
- `generation_linker_descriptors.tsv`
161166

162167
Clean generated linkers, remove too flexible, hydrophobic <br>
163168
`python ../scripts/filter_linker.py -i generation_linker_descriptors.tsv -o linker_discarded.tsv` <br>
164169

165-
Output: `linker_discarded.tsv` <br>
170+
Output:
171+
- `linker_discarded.tsv` <br>
166172

167173
### 10. Extract round 1 library
168174
`python ../scripts/library_round1.py --descriptor druglike_molecules_descriptors.tsv --sascore druglike_molecules_sascore.tsv --discarded linker_discarded.tsv -o libr1.txt`
169175

170-
Output: `libr1.txt` <br>
176+
Output:
177+
- `libr1.txt` <br>
171178

172179
### 11. Grow molecules in round 2 library from a selected hit
173180
Example of hit compound 12 <br>
174181
`python ../scripts/round2_fuse_mols.py --dl druglike_molecules.smi --gen generation_complete_indexed.smi --origin ../output_files/frag_origin.tsv --discarded linker_discarded.tsv --procare ../output_files/procare_scores.tsv` <br>
175182

176183
Outputs: <br>
177-
`hit12_round2_mols.tsv` <br>
178-
`hit12_round2_sascore_pass.tsv` <br>
184+
- `hit12_round2_mols.tsv` <br>
185+
- `hit12_round2_sascore_pass.tsv` <br>
179186

180187

181188
### 12. Filter and generate round 2 library
182189
RDKit descriptors <br>
183190
`python ../scripts/get_round2_descriptors.py -i hit12_round2_mols.tsv -o hit12_round2_mols_descriptors.tsv` <br>
184191

185-
Output: `hit12_round2_mols_descriptors.tsv` <br>
192+
Output:
193+
- `hit12_round2_mols_descriptors.tsv` <br>
186194

187195
candidates for synthesis <br>
188196
`python ../scripts/library_round2.py -i hit12_round2_mols_descriptors.tsv --sascore hit12_round2_sascore_pass.tsv -o libr2.txt` <br>
189197

190-
Output:`libr2.txt` <br>
198+
Output:
199+
- `libr2.txt` <br>
191200

192201

193202

@@ -229,6 +238,8 @@ URL = {https://doi.org/10.1021/acs.jmedchem.2c00931},
229238
## References
230239

231240
- RDKit: Open-source cheminformatics; http://www.rdkit.org, https://github.com/rdkit/rdkit
241+
- Da Silva, F.; Desaphy, J.; Rognan, D. IChem: A Versatile Toolkit for Detecting, Comparing, and Predicting Protein–Ligand Interactions. ChemMedChem 2018, 13, 507–510.
242+
- Desaphy, J.; Bret, G.; Rognan, D.; Kellenberger, E. Sc-PDB: A 3D-Database of Ligandable Binding Sites—10 Years On. Nucleic Acids Res. 2014, 43, D399–D404.
232243
- Eguida, M.; Rognan, D. A Computer Vision Approach to Align and Compare Protein Cavities: Application to Fragment-Based Drug Design. J. Med. Chem. 2020, 63, 7127–7142.
233244
- Imrie, F.; Bradley, A. R.; van der Schaar, M.; Deane, C. M. Deep Generative Models for 3D Linker Design. J. Chem. Inf. Model. 2020, 60, 1983–1995.
234-
- Desaphy, J.; Bret, G.; Rognan, D.; Kellenberger, E. Sc-PDB: A 3D-Database of Ligandable Binding Sites—10 Years On. Nucleic Acids Res. 2014, 43, D399–D404.
245+

0 commit comments

Comments
 (0)