Skip to content

Commit 2c4bfd3

Browse files
authored
Merge pull request #35 from samplchallenges/add_logP_submissions
Add logP submissions from this challenge
2 parents 247621f + 7230bf0 commit 2c4bfd3

22 files changed

+2616
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@ Of course, we also appreciate it if you cite any overview/experimental papers re
6666
- 2023-01-13: Switch logP compound numbering to have "SAMPL9-" prefix
6767
- 2023-01-20: Provide logP submission template and example
6868
- 2023-01-23: Add [submission server](https://submit.samplchallenges.org/submit/sampl9-logp) link for logP, update deadline to Jan. 31.
69+
- 2023-01-31: Add SAMPL submissions
6970

7071
## Challenge construction
7172

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
sid, e-mail, file name
2+
2, LogP_NE-FG.csv
3+
3, logP_prediction_TN_KL.csv
4+
4, logP_Sprick.csv
5+
5, logP_QM.csv
6+
6, logP_MD.csv
7+
7, logP_Mixed.csv
8+
8, logP_predictions_VoelzLab.csv
9+
9, logP_paluch_sm8.csv
10+
10, logP_tran_lser.csv
11+
11, logP_3DS.csv
12+
12, logP_SAMPL9_Beckstein_Iorga_OPLS-AA_M24.csv
13+
13, logP_SAMPL9_Beckstein_Iorga_OPLS-AA_TIP4P.csv
14+
14, logP_SAMPL9_Beckstein_Iorga_GAFF_TIP3P.csv
15+
15, logP_ECRISM-1.csv
16+
16, logP_ECRISM-2.csv
17+
17, logP_tran_lser_ufz.csv
18+
18, logP_Pitt_JWang.csv
19+
19, logP_paluch_sm8_basis.csv
20+
20, logP_oxford.csv
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
# WATER-TOLUENE (ΔG_toluene - ΔG_water) TRANSFER FREE ENERGY PREDICTIONS
2+
#
3+
# This file will be automatically parsed. It must contain the following four elements:
4+
# predictions, name of method, software listing, and method description.
5+
# These elements must be provided in the order shown with their respective headers.
6+
#
7+
# Any line that begins with a # is considered a comment and will be ignored when parsing.
8+
#
9+
#
10+
# PREDICTION SECTION
11+
#
12+
# the following is transfer free energy form water to toluene, i.e.
13+
# DG(toluene) - DG(water), where DG is the solvation energy, i.e. the free energy of
14+
# transferring a molecule from gas-phase to the solvent.
15+
Predictions:
16+
SAMPL9-1,-4.0,0.7,2.
17+
SAMPL9-2,-5.1,0.8,2.
18+
SAMPL9-3,-7.7,0.5,2.
19+
SAMPL9-4,-8.2,0.6,2.
20+
SAMPL9-5,-5.5,0.7,2.
21+
SAMPL9-6,2.8,0.5,2.
22+
SAMPL9-7,-7.7,1.1,2.
23+
SAMPL9-8,1.3,2.1,2.
24+
SAMPL9-9,-7.8,0.7,2.
25+
SAMPL9-10,-5.0,0.4,2.
26+
SAMPL9-11,-1.9,0.9,2.
27+
SAMPL9-12,1.3,0.9,2.
28+
SAMPL9-13,-2.9,0.7,2.
29+
SAMPL9-14,-5.3,0.5,2.
30+
SAMPL9-15,1.5,1.3,2.
31+
SAMPL9-16,-2.6,1.4,2.
32+
#
33+
#
34+
#
35+
# Please list your name, using only UTF-8 characters as described above. The "Participant name:" entry is required.
36+
Participant name:
37+
Piero Procacci
38+
39+
#
40+
#
41+
# Please list your organization/affiliation, using only UTF-8 characters as described above.
42+
Participant organization:
43+
University of Florence (Italy)
44+
45+
#
46+
#
47+
# NAME SECTION
48+
#
49+
# Please provide an informal but informative name of the method used.
50+
# The name must not exceed 40 characters.
51+
# The 'Name:' keyword is required as shown here.
52+
Name:
53+
(NE-FG) NonEquilibrium Fast Growth
54+
#
55+
#
56+
# COMPUTE TIME SECTION
57+
#
58+
# Please provide the average compute time across all of the molecules.
59+
# For physical methods, report the GPU and/or CPU compute time in hours.
60+
# For empirical methods, report the query time in hours.
61+
# Create a new line for each processor type.
62+
# The 'Compute time:' keyword is required as shown here.
63+
Compute time:
64+
1.5 hours (wall time) per molecule solvation free energy using 12 cluster nodes (CPU-only)
65+
66+
#
67+
# COMPUTING AND HARDWARE SECTION
68+
#
69+
# Please provide details of the computing resources that were used to train models and make predictions.
70+
# Please specify compute time for training models and querying separately for empirical prediction methods.
71+
# Provide a detailed description of the hardware used to run the simulations.
72+
# The 'Computing and hardware:' keyword is required as shown here.
73+
Computing and hardware:
74+
CRESCO6 cluster (Intel(R) Xeon(R) Platinum 8160 2.10GHz24x2 cores)
75+
https://www.eneagrid.enea.it/Resources/CRESCO_documents/CRESCO/Sezione6.html
76+
77+
# SOFTWARE SECTION
78+
#
79+
# List all major software packages used and their versions.
80+
# Create a new line for each software.
81+
# The 'Software:' keyword is required.
82+
Software:
83+
ORAC6.1 http://www1.chim.unifi.it/orac/
84+
85+
# METHOD CATEGORY SECTION
86+
#
87+
# State which method category your prediction method is better described as:
88+
# `Physical (MM)`, `Physical (QM)`, `Empirical`, or `Mixed`.
89+
# Pick only one category label.
90+
# The `Category:` keyword is required.
91+
Category:
92+
physical (MM)
93+
94+
# METHOD DESCRIPTION SECTION
95+
#
96+
# Methodology and computational details.
97+
# Level of details should be roughly equivalent to that used in a publication.
98+
# Please include the values of key parameters with units.
99+
# Please explain how statistical uncertainties were estimated.
100+
#
101+
# If you have evaluated additional microstates, please report their SMILES strings and populations of all the microstates in this section.
102+
# If you used a microstate other than the challenge provided microstate (`SMXX_micro000`), please list your chosen `Molecule ID` (in the form of `SMXX_extra001`) along with the SMILES string in your methods description.
103+
#
104+
# Use as many lines of text as you need.
105+
# All text following the 'Method:' keyword will be regarded as part of your free text methods description.
106+
107+
Method:
108+
Force field for solute is GAFF2 with AM1-BCC charges (generated with
109+
PrimaDORAC www1.chim.unifi.it/orac/primadorac); Starting
110+
nonequilibrium state is prepared by combining 96 configurations of the
111+
gas-phase solute sampled with Hamiltonian Replica Exchange (minimum
112+
scaling factor 0.1 -> T=3000 K) combined with 96 configuration of pure
113+
solvent in standard conditions (NPT). Gas-phase HREM simulations were
114+
run for 8 ns. MD simulations of pure solvent were run for 5 ns.
115+
Solute molecules are in their neutral state. For the solvent, we used
116+
343 Toluene molecules in a cubic box (GAFF2-AM1-BCC) with a mean
117+
side-length of 39.72 Angs and 1728 OPC3 water molecules in a cubic box
118+
with a mean side-length of 37.45 Angs. In the 100 NE alchemy runs, the
119+
initially decoupled solute was recoupled in the solvent (NPT
120+
condition) in 450 ps. The solvation free energies DG_w and DG_t are
121+
evaluated using the Gaussian estimate if the p-value of the
122+
Anderson-Darling test of the work distribution is greater than 50% or
123+
with the Jarzynski estimate DG=-RT ln(e^{-\beta W} with bias estimated
124+
from the variance. Errors (confidence interval 95%) are computed with
125+
bootstrap with resampling. The Toluene-water partition coefficient is
126+
computed as LogTW= -(DG_t -DG_w )/RT/log(10). The water to toluene
127+
transfer free energy is DG_t - DG_w. Calculation were done on the
128+
ENEA-CRESCO6 cluster in Portici (NA), Italy using the program ORAC
129+
(www1.chim.unifi.it/orac ). Wall-clock time for a full LogTW
130+
calculation, on a per solute basis, was ~ 1.5 hour using 12 48-cores
131+
nodes (Skylake Intel(R) Xeon(R) Platinum 8160 2.10GHz).
132+
#
133+
#
134+
# All submissions must either be ranked or non-ranked.
135+
# Only one ranked submission per participant is allowed.
136+
# Multiple ranked submissions from the same participant will not be judged.
137+
# Non-ranked submissions are accepted so we can verify that they were made before the deadline.
138+
# The "Ranked:" keyword is required, and expects a Boolean value (True/False)
139+
Ranked:
140+
True
Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
# WATER-TOLUENE (ΔG_toluene - ΔG_water) TRANSFER FREE ENERGY PREDICTIONS
2+
#
3+
# This file will be automatically parsed. It must contain the following four elements:
4+
# predictions, name of method, software listing, and method description.
5+
# These elements must be provided in the order shown with their respective headers.
6+
#
7+
# Any line that begins with a # is considered a comment and will be ignored when parsing.
8+
#
9+
#
10+
# PREDICTION SECTION
11+
#
12+
# It is mandatory to submit water to octanol (ΔG_octanol - ΔG_water) transfer free energy (TFE) predictions for all 22 molecules.
13+
# Incomplete submissions will not be accepted.
14+
# The energy units must be in kcal/mol.
15+
16+
# Please report the general molecule `ID tag` in the form of `SAMPL9-XX` (e.g. SAMPL9-1, SAMPL9-2, etc).
17+
# Please report TFE standard error of the mean (SEM) and TFE model uncertainty.
18+
#
19+
# The data in each prediction line should be structured as follows:
20+
# ID tag, TFE, TFE SEM, TFE model uncertainty
21+
#
22+
# If you use a microstate other than the challenge provided microstate, please note SMILES strings of microstates you used in your submission, such as in the methods section.
23+
#
24+
# The list of predictions must begin with the 'Predictions:' keyword as illustrated here.
25+
Predictions:
26+
SAMPL9-1,-2.52,0.4,0.1
27+
SAMPL9-2,-5.89,0.4,0.1
28+
SAMPL9-3,-9.81,0.4,0.1
29+
SAMPL9-4,-8.91,0.4,0.1
30+
SAMPL9-5,-7.24,0.4,0.1
31+
SAMPL9-6,4.66,0.4,0.1
32+
SAMPL9-7,-6.48,0.4,0.1
33+
SAMPL9-8,-2.26,0.4,0.1
34+
SAMPL9-9,-9.76,0.4,0.1
35+
SAMPL9-10,-3.08,0.4,0.1
36+
SAMPL9-11,-2.54,0.4,0.1
37+
SAMPL9-12,1.66,0.4,0.1
38+
SAMPL9-13,-2.67,0.4,0.1
39+
SAMPL9-14,-4.12,0.4,0.1
40+
SAMPL9-15,3.71,0.4,0.1
41+
SAMPL9-16,-5.75,0.4,0.1
42+
#
43+
#
44+
# Please list your name, using only UTF-8 characters as described above. The "Participant name:" entry is required.
45+
Participant name:
46+
Michael Diedenhofen, Arnim Hellweg
47+
48+
#
49+
#
50+
# Please list your organization/affiliation, using only UTF-8 characters as described above.
51+
Participant organization:
52+
Dassault Systemes Deutschland GmbH BIOVIA
53+
54+
#
55+
#
56+
# NAME SECTION
57+
#
58+
# Please provide an informal but informative name of the method used.
59+
# The name must not exceed 40 characters.
60+
# The 'Name:' keyword is required as shown here.
61+
Name:
62+
COSMO-RS
63+
64+
#
65+
#
66+
# COMPUTE TIME SECTION
67+
#
68+
# Please provide the average compute time across all of the molecules.
69+
# For physical methods, report the GPU and/or CPU compute time in hours.
70+
# For empirical methods, report the query time in hours.
71+
# Create a new line for each processor type.
72+
# The 'Compute time:' keyword is required as shown here.
73+
Compute time:
74+
The conformer COSMO file creation took ~40 hours (wall time) using 1 to 48 processors.
75+
The number of processors used changes in the course of the calculations.
76+
77+
#
78+
# COMPUTING AND HARDWARE SECTION
79+
#
80+
# Please provide details of the computing resources that were used to train models and make predictions.
81+
# Please specify compute time for training models and querying separately for empirical prediction methods.
82+
# Provide a detailed description of the hardware used to run the simulations.
83+
# The 'Computing and hardware:' keyword is required as shown here.
84+
Computing and hardware:
85+
For the conformer generation we have used a linux machine (AMD EPYC 7413 24-Core Processor). The calculations could use up to 48 cores. The number of actually used CPU differ during the steps of the conformer search procedure.
86+
Compared to the creation of the conformer sets the time for the COSMO-RS logD calculations is neglectable (one minute on a windows notebook, one CPU).
87+
88+
# SOFTWARE SECTION
89+
#
90+
# List all major software packages used and their versions.
91+
# Create a new line for each software.
92+
# The 'Software:' keyword is required.
93+
Software:
94+
BIOVIA COSMOquick 2023: A proprietary tool of Dassault Systemes to generate tautomeric states.
95+
BIOVIA COSMOconf 2023: A proprietary tool of Dassault Systemes for conformer generation (uses TURBOMOLE and COSMOtherm)
96+
BIOVIA COSMObase 2023: A proprietary tool of Dassault Systemes (COSMO file Database)
97+
BIOVIA COSMOtherm 2023: The proprietary software developed and distributed by Dassault Systemes, which uses COSMO-RS. COSMO-RS is a published theory: Klamt A (1995) Conductor-like screening model for real solvents: a new approach to the quantitative calculation of solvation phenomena. J Phys Chem 99:2224-2235. https://doi.org/10.1021/j100007a062
98+
TURBOMOLE 7.7: The quantum chemistry suite. University of Karlsruhe and Forschungszentrum Karlsruhe GmbH, 1989-2007, TURBOMOLE GmbH, since 2007; available from http://www.turbomole.com: Karlsruhe, Germany,2020"
99+
xTB: C. Bannwarth, E. Caldeweyher, S. Ehlert, A. Hansen, P. Pracht, J. Seibert, S. Spicher, S. Grimme, WIREs Comput. Mol. Sci., 2020, e01493. DOI: 10.1002/wcms.1493
100+
101+
# METHOD CATEGORY SECTION
102+
#
103+
# State which method category your prediction method is better described as:
104+
# `Physical (MM)`, `Physical (QM)`, `Empirical`, or `Mixed`.
105+
# Pick only one category label.
106+
# The `Category:` keyword is required.
107+
Category:
108+
Physical (QM)
109+
110+
# METHOD DESCRIPTION SECTION
111+
#
112+
# Methodology and computational details.
113+
# Level of details should be roughly equivalent to that used in a publication.
114+
# Please include the values of key parameters with units.
115+
# Please explain how statistical uncertainties were estimated.
116+
#
117+
# If you have evaluated additional microstates, please report their SMILES strings and populations of all the microstates in this section.
118+
# If you used a microstate other than the challenge provided microstate (`SMXX_micro000`), please list your chosen `Molecule ID` (in the form of `SMXX_extra001`) along with the SMILES string in your methods description.
119+
#
120+
# Use as many lines of text as you need.
121+
# All text following the 'Method:' keyword will be regarded as part of your free text methods description.
122+
Method:
123+
Conformer/tautomer workflow:
124+
1) We used COSMOquick + xTB approach to generate possible tautomers. After a first DFT optimization we discarded all structures with relative energies >15 kcal/mol. The tautomer sets were then checked manually and missing tautomers were added.
125+
2) COSMOconf was used to create the COSMO files for the conformer set of the microstates.
126+
COSMO-RS calculations:
127+
All COSMO-RS calculation have been performed with the COSMOtherm software (parametrization: BP_TZVPD_FINE_23.ctd).
128+
The COSMO files for Water, Toluene, SAMPL9-10, SAMPL9-11, SAMPL9-15, SAMPL9-9, SAMPL9_12 and SAMPL9_4 have been taken from the COSMObase2023, all other files were calculated according to the procedure described above (step 1+2)
129+
The conformer sets of the microstates (tautomers) of the compounds have been merged to form the conformer set of the compound. These sets have been used for the prediction of the partition coefficients between pure water and a toluene phase with 0.23 mol % water (N. Peschke, S. I. Sandler, J. Chem. Eng. Data 1995, 40, 315-320).
130+
The COSMO-RS algorithm by itself has no statistical error. The overall workflow including the conformational search (or molecule or state) has a statistical noise smaller than 0.1 kcal/mol.
131+
As an error estimation for the underlying partition coefficients we use the root mean square deviation (RMSD) of 0.4 kcal/mol (taken from COSMO-RS parametrization).
132+
133+
#
134+
#
135+
# All submissions must either be ranked or non-ranked.
136+
# Only one ranked submission per participant is allowed.
137+
# Multiple ranked submissions from the same participant will not be judged.
138+
# Non-ranked submissions are accepted so we can verify that they were made before the deadline.
139+
# The "Ranked:" keyword is required, and expects a Boolean value (True/False)
140+
Ranked:
141+
True
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
Predictions:
2+
SAMPL9-1,-3.35,0.01,1.24
3+
SAMPL9-2,-4.01,0.01,1.24
4+
SAMPL9-3,-7.25,0.01,1.24
5+
SAMPL9-4,-4.91,0.01,1.24
6+
SAMPL9-5,-6.23,0.01,1.24
7+
SAMPL9-6,4.94,0.01,1.24
8+
SAMPL9-7,-8.60,0.01,1.24
9+
SAMPL9-8,-5.20,0.01,1.24
10+
SAMPL9-9,-7.60,0.01,1.24
11+
SAMPL9-10,-1.74,0.01,1.24
12+
SAMPL9-11,5.13,0.01,1.24
13+
SAMPL9-12,4.96,0.01,1.24
14+
SAMPL9-13,-0.59,0.01,1.24
15+
SAMPL9-14,-2.35,0.01,1.24
16+
SAMPL9-15,4.69,0.01,1.24
17+
SAMPL9-16,-5.56,0.01,1.24
18+
19+
20+
Participant name:
21+
Stefan M. Kast, Michael Strobl, Stefan Guessregen, Nicolas Tielker
22+
23+
Participant organization:
24+
TU Dortmund University
25+
26+
Name:
27+
EC-RISM_TFE_P3
28+
29+
Compute time:
30+
45.3
31+
32+
Computing and hardware:
33+
All calculations were conducted on the LiDO 3 high performance cluster of TU Dortmund University. Calculations were automatically scheduled and ran on either an Intel Xeon E5-4604v4 or an Intel Xeon E5-2640v4 CPU, depending on node availability.
34+
35+
Software:
36+
Schrödinger Suite 2022-2
37+
Gaussian 16 Rev C.01
38+
3D RISM (inhouse development)
39+
EC-RISM (inhouse development)
40+
Python 3.6
41+
Anaconda2018.12
42+
Amber 12
43+
Mathematica 11.3 (Wolfram)
44+
45+
Category:
46+
Physical (QM)
47+
48+
Method:
49+
Geometries were generated using Schrödinger Suite 2022-2 with the OPLS4/Water and OPLS4/CHCl3 (to approximate toluene) force fields with mixed torsional/low-mode sampling, 100 steps per RB and 1000 steps max (default parameters), energy window 21.0 and RMSD cutoff of 1.5. These conformers were then optimized using Gaussian 16 Rev C.01 with IEF-PCM using default settings for water and toluene, respectively, at the B3LYP/6-311+G(d,p) level of theory and clustered to remove duplicate conformations.
50+
For each microstate only up to 5 conformations with the lowest PCM energies for each solvent were treated with EC-RISM//MP2/6-311+G(d,p) using the PSE2 closure for water and the PSE1 closure for toluene [REF1] and the resulting EC-RISM energies corrected using the general formula (c_1*mu_{ex}+c_2*PMV_{EC-RISM}+c_3), with the correction parameters trained using the Minnesota Solvation Database (MNSOL). For water not all parameters were used, i.e. c_1 = 1 and c_3 = 0, because the additional parameters were found to be of no predictive value in previous challenges [REF2].
51+
The transfer free energy of a compound was then calculated by dG_trans=G_{tol}-G_{wat}, where G_{m} refers to the partition function estimate of the solvent specific free energy by summing over the conformational and tautomer states [REF3].
52+
The SEM was estimated as the convergence criterion for a single EC-RISM calculation. The uncertainty was estimated as the RMSE of a number of MNSOL compounds for which water-toluene TFEs could be found in the literature [REF4], with G_{wat} and G_{tol} taken from the training set data.
53+
54+
References
55+
[REF1] N. Tielker, D. Tomazic, J. Heil, T. Kloss, S. Ehrhart, S. Guessregen, K. F. Schmidt, S. M. Kast, J. Comput.-Aided Mol. Des. 30, 1035-1044 (2016).
56+
[REF2] N. Tielker, L. Eberlein, S. Guessregen, S. M. Kast, J. Comput.-Aided Mol. Des. 32, 1151-1163 (2018).
57+
[REF3] N. Tielker, D. Tomazic, L. Eberlein, S. Guessregen, S. M. Kast, J. Comput.-Aided Mol. Des. 34, 453-461 (2020).
58+
[REF4] A. M. Zissimos, M. H. Abraham, M. C. Barker, K. J. Box, K, Y. Tam, J. Chem. Soc., Perkin Trans. 2, 470-477 (2002).
59+
60+
Additional microstates
61+
Name,SMILES,population in water,population in toluene
62+
SAMPL9-1_micro000,CCCSc1ccc2c(c1)[nH]c(n2)NC(=O)OC,62.47%,58.65%
63+
SAMPL9-1_extra001,CCCSc1ccc2c(c1)nc([nH]2)NC(=O)OC,37.20%,41.35%
64+
SAMPL9-1_extra002,CCCSc1ccc2c(c1)[nH]/c(=N\C(=O)OC)/[nH]2,0.33%,0.00%
65+
SAMPL9-6_micro000,CNC[C@H](O)c1ccc(O)c(O)c1,100.00%,100.00%
66+
SAMPL9-6_extra001,CNC[C@H](O)C1=C[C@@H](C(=O)C=C1)O,0.00%,0.00%
67+
SAMPL9-10_micro000,C[C@@H](C(=O)O)c1cccc(c1)C(=O)c1ccccc1,100.00%,100.00%
68+
SAMPL9-10_extra001,CC(=C(O)O)c1cccc(c1)C(=O)c1ccccc1,0.00%,0.00%
69+
SAMPL9-11_micro000,CCn1cc(C(=O)O)c(=O)c2ccc(C)nc12,100.00%,100.00%
70+
SAMPL9-11_extra001,CCn1cc(C(=O)[O-])c(=O)c2ccc(C)[nH+]c12,0.00%,0.00%
71+
SAMPL9-12_micro000,CC(=O)Nc1ccc(O)cc1,100.00%,100.00%
72+
SAMPL9-12_extra001,C=C(O)Nc1ccc(O)cc1,0.00%,0.00%
73+
SAMPL9-12_extra002,C=C(O)NC1=CCC(=O)C=C1,0.00%,0.00%
74+
SAMPL9-12_extra003,CC(=O)NC1=CCC(=O)C=C1,0.00%,0.00%
75+
SAMPL9-15_micro000,Cc1cc(C)nc(NS(=O)(=O)c2ccc(N)cc2)n1,94.64%,100.00%
76+
SAMPL9-15_extra001,Cc1cc(C)[nH]/c(=N/S(=O)(=O)c2ccc(N)cc2)/n1,5.36%,0.00%
77+
78+
Ranked:
79+
True

0 commit comments

Comments
 (0)