Skip to content

Commit a4bae8d

Browse files
authored
Merge pull request #23 from wehs7661/fix_poor_sampling
Resolve the issue of poor sampling
2 parents f998107 + 3976640 commit a4bae8d

File tree

3 files changed

+83
-4
lines changed

3 files changed

+83
-4
lines changed

docs/simulations.rst

Lines changed: 42 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,8 @@ by the single GRO file, but the user should also be able to initialize different
127127
configurations (represented by multiple GRO files) in the near future. Also, the MDP template should contain parameters
128128
common across all replicas and define the coupling parmaeters for all possible intermediate states,
129129
so that we can cusotmize different MDP files by defining a subset of alchemical states in different
130-
replicas. Importantly, to extend an EEXE simulation, one needs to additionally provide the following
130+
replicas. For EEXE simulations, some MDP parameters need additional care to be taken, which we describe in
131+
:ref:`doc_mdp_params`. Importantly, to extend an EEXE simulation, one needs to additionally provide the following
131132
two checkpoint files:
132133

133134
* One NPY file containing the replica-space trajectories of different configurations saved by the previous run of EEXE simulation with a default name as :code:`rep_trajs.npy`.
@@ -206,9 +207,9 @@ iterations (:code:`n_iterations`) is reached.
206207

207208
.. _doc_parameters:
208209

209-
3. Simulation parameters
210+
3. Input YAML parameters
210211
========================
211-
In the current implementation of the algorithm, 22 parameters can be specified in the input YAML file.
212+
In the current implementation of the algorithm, 27 parameters can be specified in the input YAML file.
212213
Note that the two CLIs :code:`run_EEXE` and :code:`analyze_EEXE` share the same input YAML file, so we also
213214
include parameters for data analysis here.
214215

@@ -378,3 +379,41 @@ parameters left with a blank. Note that specifying :code:`null` is the same as l
378379
n_bootstrap: 50
379380
seed : null
380381
382+
.. _doc_mdp_params:
383+
384+
4. Input MDP parameters
385+
=======================
386+
As mentioned above, a template MDP file should have all the parameters that will be shared
387+
across all replicas. It should also define the coupling parameters for the whole range of
388+
states so that different MDP files can be customized for different replicas. For an EEXE simulation
389+
launched by the CLI :code:`run_EEXE`, any GROMACS MDP parameter that could potentially lead to issues
390+
in the EEXE simulation will raise a warning. If the number of warnings is larger than the value
391+
specified for the flag `-m`/`--maxwarn` in the CLI :code:`run_EEXE`, the simulation will error
392+
out. To avoid warnings arised from MDP specification, we need to take extra care for the following
393+
MDP parameters:
394+
395+
- We recommend setting :code:`lmc_seed = -1` so that a different random seed
396+
for Monte Carlo moves in the state space will be used for each iteration.
397+
- We recommend setting :code:`gen_vel = yes` to re-generating new velocities for each iteration to avoid
398+
potential issues with detailed balance.
399+
- We recommend setting :code:`gen_seed = -1` so that a different random seed for velocity generation
400+
will be used for each iteration.
401+
- The MDP parameter :code:`nstlog` must be a factor of the YAML parameter :code:`nst_sim` so that the final status
402+
of the simulation can be correctly parsed from the LOG file.
403+
- The MDP parameter :code:`nstdhdl` must be a factor of the YAML parameter :code:`nst_sim` so that the time series
404+
of the state index can be correctly parsed from the DHDL file.
405+
- In EEXE, the MDP parameter :code:`nstdhdl` must be a factor of the MDP parameter :code:`nstexpanded`, or
406+
the calculation of the acceptance ratio may be wrong.
407+
- Be careful with the pull code specification if you want to apply a distance restraint between two pull groups.
408+
Specifically, in an EEXE simulation, all iterations should use the same reference distance. Otherwise, poor sampling
409+
can be observed in a fixed-weight EEXE simulation and the equilibration time may be much longer for a weight-updating
410+
EEXE simulation. To ensure the same reference distance across all iterations in an EEXE simulation, consider the
411+
following scenarios:
412+
- If you would like to use the COM distance between the pull groups in the input GRO file as the reference distance
413+
for all the iterations (whatever that value is), then specify :code:`pull_coord1_start = yes` with
414+
:code:`pull_coord1_init = 0` in your input MDP template. In this case, :obj:`.update_MDP` will parse :code:`pullx.xvg`
415+
from the first iteration to get the initial COM distance (:code:`d`) and use it as the reference distance for all the following
416+
iterations using :code:`pull_coord1_start = no` with :code:`pull_coord1_init = d`. Note that this implies that
417+
the MDP parameter :code:`pull_nstxout` should not be 0.
418+
- If you want to explicitly specify a reference distance (:code:`d`) to use for all iterations, simply use
419+
:code:`pull_coord1_start = no` with :code:`pull_coord1_init = d` in your input MDP template.

ensemble_md/cli/run_EEXE.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,9 @@ def main():
130130

131131
start_idx = comm.bcast(start_idx, root=0) # so that all the ranks are aware of start_idx
132132

133+
# 2-3. Get the reference distance for the distance restraint specified in the pull code, if any.
134+
EEXE.get_ref_dist()
135+
133136
for i in range(start_idx, EEXE.n_iter):
134137
if rank == 0:
135138
# Step 3: Swap the coordinates

ensemble_md/ensemble_EXE.py

Lines changed: 38 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -305,7 +305,7 @@ def set_params(self, analysis):
305305
self.warnings.append('Warning: We recommend setting gen_seed as -1 so the random seed is different for each iteration.') # noqa: E501
306306

307307
if 'gen_vel' not in self.template or ('gen_vel' in self.template and self.template['gen_vel'] == 'no'):
308-
self.warnings.append('Warning: We recommend generating new velocities for each iteration to avoid potential issues with the detailed balance.') # noqa: E501
308+
self.warnings.append('Warning: We recommend generating new velocities for each iteration to avoid potential issues with detailed balance.') # noqa: E501
309309

310310
if self.nst_sim % self.template['nstlog'] != 0:
311311
raise ParameterError(
@@ -341,6 +341,22 @@ def set_params(self, analysis):
341341
raise ParameterError(
342342
'In EEXE, the parameter "nstdhdl" must be a factor of the parameter "nstexpanded", or the calculation of acceptance ratios might be wrong.') # noqa: E501
343343

344+
if 'pull' in self.template and self.template['pull'] == 'yes':
345+
pull_ncoords = self.template['pull_ncoords']
346+
self.set_ref_dist = []
347+
for i in range(pull_ncoords):
348+
if self.template[f'pull_coord{i+1}_geometry'] == 'distance':
349+
if self.template[f'pull_coord{i+1}_start'] == 'yes':
350+
self.set_ref_dist.append(True) # starting from the second iteration, set pull_coord*_init.
351+
if 'pull_nstxout' not in self.template:
352+
self.warnings.append('A non-zero value should be specified for pull_nstxout if pull_coord*_start is set to yes.') # noqa: E501
353+
if self.template['pull_nstxout'] == 0:
354+
self.warnings.append('A non-zero value should be specified for pull_nstxout if pull_coord*_start is set to yes.') # noqa: E501
355+
else:
356+
self.set_ref_dist.append(False) # Here we assume that the user know what reference distance to use. # noqa: E501
357+
else:
358+
self.set_ref_dist.append(False) # we only deal with distance restraints for now.
359+
344360
# Step 7: Set up derived parameters
345361
# 7-1. kT in kJ/mol
346362
self.kT = k * NA * self.temp / 1000 # 1 kT in kJ/mol
@@ -546,6 +562,21 @@ def initialize_MDP(self, idx):
546562

547563
return MDP
548564

565+
def get_ref_dist(self):
566+
"""
567+
Gets the reference distance(s) to use starting from the second iteration if distance restraint(s) are used.
568+
Specifically, a reference distance determined here is the initial COM distance between the pull groups
569+
in the input GRO file. This function initializes the attribute :code:`ref_dist`.
570+
"""
571+
if hasattr(self, 'set_ref_dist'):
572+
self.ref_dist = []
573+
pullx_file = 'sim_0/iteration_0/pullx.xvg'
574+
headers = get_headers(pullx_file)
575+
for i in range(len(self.set_ref_dist)):
576+
if self.set_ref_dist[i] is True:
577+
dist = list(extract_dataframe(pullx_file, headers=headers)[f'{i+1}'])[0]
578+
self.ref_dist.append(dist)
579+
549580
def update_MDP(self, new_template, sim_idx, iter_idx, states, wl_delta, weights, counts=None):
550581
"""
551582
Updates the MDP file for a new iteration based on the new MDP template coming from the previous iteration.
@@ -598,6 +629,12 @@ def update_MDP(self, new_template, sim_idx, iter_idx, states, wl_delta, weights,
598629
MDP["lmc_weights_equil"] = ""
599630
MDP["weight_equil_wl_delta"] = ""
600631

632+
# Here we deal with the distance restraint in the pull code, if any.
633+
if hasattr(self, 'ref_dist'):
634+
for i in range(len(self.ref_dist)):
635+
MDP[f'pull_coord{i+1}_start'] = "no"
636+
MDP[f'pull_coord{i+1}_init'] = self.ref_dist[i]
637+
601638
return MDP
602639

603640
def extract_final_dhdl_info(self, dhdl_files):

0 commit comments

Comments
 (0)