Skip to content

Commit 30a12c1

Browse files
author
Jacob Monroe
authored
Merge pull request #56 from MobleyLab/mobley2
Edit the section on components of a simulation
2 parents abd4eb4 + 8422643 commit 30a12c1

File tree

2 files changed

+73
-42
lines changed

2 files changed

+73
-42
lines changed

paper/basic_training.pdf

10.7 KB
Binary file not shown.

paper/basic_training.tex

Lines changed: 73 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -204,7 +204,7 @@ \subsubsection{Key concepts}
204204
\end{itemize}
205205

206206
One of the main objectives of molecular simulations is to estimate/predict thermodynamic behavior of real systems as observed in the laboratory.
207-
Typically this means we are interested in macroscopic systems, consisting of $10^23$ particles or more (i.e. at least several moles of particles).
207+
Typically this means we are interested in macroscopic systems, consisting of $10^{23}$ particles or more (i.e. at least several moles of particles).
208208
But properties of interest include not only macroscopic, bulk thermodynamic properties, such as density or heat capacity,
209209
but also microscopic properties like specific free energy differences associated with, say, changes in the conformation of a molecule.
210210
For this reason, it is important to understand key concepts in thermodynamics, such as temperature, pressure, entropy, internal energy, various forms of free energy, and the relationships between them.
@@ -568,73 +568,104 @@ \subsection{Main steps of a molecular dynamics simulation}
568568
It should be noted that these steps may be difficult to unambiguously differentiate and define in some cases.
569569
Additionally, it is assumed that prior to performing any of these steps, an appropriate amount of deliberation has been devoted to clearly defining the system and determining the appropriate simulation techniques.
570570

571-
System preparation is typically the most variable of these steps, and often requires unique tools for every system of interest.
572-
It is highly recommended that best practices documents specific to this issue and to the type of system of interest be consulted.
571+
\subsubsection{System preparation}
572+
573+
System preparation focuses on preparing the starting state of the desired system for simulation with the desired simulation package, including building a starting structure, solvating (if necessary), applying a force field etc.
574+
Because this step differs so much depending on the composition of the system and what information is available about the starting structure, it is a step which varies a great deal depending on the type of system and each category may require unique tools.
575+
576+
Given the variable nature of system preparation, it is highly recommended that best practices documents specific to this issue and to the type of system of interest be consulted.
573577
If such documents do not exist, considerable care should be exercised to determine best practices from the literature.
574-
In some cases, freely available tools are constructing systems are available and can be a reasonable option (though their mention here should not be taken as an endorsement that they necessarily encapsulate best practices).
578+
579+
Loosely speaking, system preparation can be thought of as consisting of two \emph{logical} components which are not necessarily consecutive or separate.
580+
One comprises building the configuration of the system in the desired chemical state, and the other, applying force field parameters.
581+
582+
For building systems, freely available tools for constructing systems are available and can be a reasonable option (though their mention here should not be taken as an endorsement that they necessarily encapsulate best practices).
575583
Examples include tools for constructing specific crystal structures, proteins, and lipid membranes, such as Moltemplate, Packmol, and Atomsk.
576-
\todo[inline, color={green!20}]{JIM: Need to cite these packages!}
577-
The goal of all of these tools, and system preparation in general, is to create an accurate representation of the system of interest that can be interpreted by the desired simulation package.
578-
It is further desirable that this starting structure resemble the equilibrium structure of the system at the thermodynamic state point of interest.
584+
585+
A key consideration when building a system is that the starting structure ideally ought to resemble the equilibrium structure of the system at the thermodynamic state point of interest.
579586
For instance, highly energetically unfavorable configurations of the system, such as blatant atomic overlaps, should be avoided.
580587
In some sense, having a good starting structure is only a convenience to reduce equilibration times (if the force field is adequate); however, for some systems, equilibration times might otherwise be prohibitively long.
581588

582589
System preparation is arguably the most critical stage of a simulation and in many cases receives the least attention.
583-
Specifically, if your system preparation is flawed, such flaws may prove fatal.
590+
Specifically, if your system preparation is flawed, such flaws may prove fatal.
584591
Potentially the worst possible outcome is if the prepared system is not what you intended (e.g. it contains incorrect molecules or protonation states) but is chemically valid and well described by your force field and thus proceeds without error through the remaining steps --- and in fact this is a frequent outcome of problems in system preparation.
585592
It should not be assumed that if a system can proceed in a well-behaved manner through the other steps, it was necessarily prepared correctly; considerable care should be taken here.
586593

594+
Assignment or development of force field parameters is also critical, but is outside the scope of this work.
595+
For our purposes here, we will assume you have already obtained or developed force field parameters suitable for your system of interest.
596+
597+
\subsubsection{Minimization}
598+
587599
The purpose of minimization, or relaxation, is to find a local energy minimum of the starting structure so that the molecular dynamics simulation does not immediately "blow up" (i.e. the forces on any one atom are not so large that the atoms move an unreasonable distance in a single timestep).
588600
This involves standard minimization algorithms such as steepest descent.
589601
For a more involved discussion of minimization algorithms utilized in molecular simulation, see \citet{LeachBook}, sections 5.1-5.7.
590602

591-
At the end of energy minimization, it is important to achieve a system configuration with small enough forces that the desired timestep will allow numerical integration of the equations of motion without overly large displacements (see \citet{LeachBook}, section 7.3.4).
592-
Such a configuration is a suitable starting point for molecular dynamics techniques.
593-
However, this only represents a static set of positions, while the propagation of dynamics also requires a set of starting velocities.
594-
These may be assigned in a variety of ways, but are usually randomly assigned to atoms such that the correct Maxwell-Boltzmann distribution at the desired temperature is achieved.
603+
\subsubsection{Assignment of velocities}
604+
Minimization ideally takes us to a state from which we can begin numerical integration of the equations of motion without overly large displacements (see \citet{LeachBook}, section 7.3.4); however, to begin a simulation, we need not just positions but also velocities.
605+
Minimization, however, provides only a final set of positions.
606+
Thus, starting velocities must be assigned; usually this is done by assigning random initial velocities to atoms in a way such that the correct Maxwell-Boltzmann distribution at the desired temperature is achieved as a starting point.
607+
608+
In some cases, we seek to obtain multiple separate and independent simulations of different instances or realizations of a particular system to assess error, collect better statistics, or help gauge dependence of results on the starting structure.
609+
It is worth noting that even very small differences in initial configuration, such as even a difference in position of a single atom, lead to exponential divergence of the time evolution of the system~\cite{allen_computer_2017}, meaning that simply running different simulations starting with different initial velocities will lead to dramatically different time evolution over long enough times.
610+
Still, an even better way to generate independent realizations is to begin with different starting configurations, such as different conformations of the molecule(s) being simulated, as this leads to behavior which is immediately different.
611+
612+
\subsubsection{Equilibration}
613+
614+
Ultimately, we usually seek to run a simulation in a particular thermodynamic ensemble (e.g. the NVE or NVT ensemble) at a particular state point (e.g. target energy, temperature, and pressure) and collect data for analysis which is appropriate for those conditions and not biased depending on our starting conditions/configuration.
615+
This means that usually we need to invest simulation time in bringing the system to the appropriate state point and allowing it to essentially forget about its history and reach equilibrium (or pseudo-equilibrium -- for some systems, such as biomolecular systems, reaching true equilibrium may be impractical) before we begin retaining data for analysis.
616+
617+
The most straightforward portion of equilibrium is bringing the system to the target state point.
618+
Usually, even though velocities are assigned according to the correct distribution, a thermostat will still need to add or remove heat from the system as it approaches the correct partitioning of kinetic and potential energies.
619+
For this reason, it is advised that a thermostatted simulation is performed prior to a desired production simulation, even if the production simulation will ultimately be done in the NVE ensemble.
620+
This phase of equilibration can be monitored by assessing the temperature and pressure of the system, as well as the kinetic and potential energy, to ensure these reach a steady state on average.
621+
For example, an NPT simulation is said to have equilibrated to a specific volume when the dimensions of the simulation box fluctuate around constant values with minimal drift.
622+
This definition, though not perfectly rigorous, is usually suitable for assessing the equilibration of energies, temperature, pressure, and box dimensions during equilibration simulations.
595623

596-
Following minimization, equilibration is typically needed to bring the system to the desired conditions (e.g. temperature and pressure, or energy).
597-
Specifically, even though velocities are assigned according to the correct distribution, the selected thermostat will still usually need to add heat to or remove heat from the system as it approaches the correct partitioning of kinetic and potential energies.
598-
For this reason, it is advised that a thermostatted simulation is performed prior to a desired production simulation, even if the production simulation will ultimately be done in the NVE ensemble.
599-
Once the kinetic and potential energies fluctuate around constant values, the thermostat may be removed (if an NVE simulation is desired) and a snapshot selected that is simultaneously as close to the average kinetic and potential energies as possible.
624+
A more difficult portion of equilibration is to ensure that other properties of the system which are likely to be important are also no longer changing systematically with simulation time.
625+
At equilibrium, a system may still undergo slow fluctuations with time, especially if it has slow internal degrees of freedom -- but key properties should no longer show systematic trends away from their starting structure.
626+
Thus, for example, for biomolecular simulations it is common to examine the root mean squared deviation (RMSD) of the molecules involved as a function of time, and potentially other properties like the number of hydrogen bonds between the biomolecules present and water, as these may be slower to equilibrate than system-wide properties like the temperature and pressure.
627+
628+
Once the kinetic and potential energies fluctuate around constant values and other key properties are no longer changing with time, the equilibration period has reached its end.
629+
630+
Depending on the target ensemble for production, the procedure for the end of equilibration is somewhat different.
631+
If an NVE simulation is desired, the thermostat may be removed and a snapshot selected that is simultaneously as close to the average kinetic and potential energies as possible.
600632
This snapshot, containing both positions and velocities may be used to then start an NVE simulation that will correspond to a temperature close to that which is desired.
601633
This is necessary due to the fact that only the average temperature is obtained through coupling to a thermostat (see Section~\ref{sec:thermostats}), and the temperature fluctuates with the kinetic energy at each timestep.
602634

603-
Similarly, equilibration in the NPT ensemble is necessary before production in the NVT if an average density consistent with a specific pressure is desired.
604-
In this case, the system may be scaled to the desired average volume before the production simulation.
605-
In the above example, the NPT simulation is said to have equilibrated to a specific volume when the dimensions of the simulation box fluctuate around constant values with minimal drift.
606-
This definition, though not perfectly rigorous, is usually suitable for assessing the equilibration of energies, temperature, pressure, and box dimensions during equilibration simulations.
607-
The schematic below (\ref{eqworkflow}) demonstrates what is generally an appropriate equilibriation work-flow for common production ensembles.
635+
If the target is a simulation in the NVT ensemble at a particular density, equilibration should be done in the NPT ensemble.
636+
In this case, the system may be scaled to the desired average volume before starting a production simulation (and if rescaling is done, additional equilibration might be needed).
637+
638+
The schematic below (\ref{eqworkflow}) demonstrates what is generally an appropriate equilibration work-flow for common production ensembles.
608639
Clearly, this schematic cannot cover every case of interest, but should provide some idea of the general approach.
609640
For more information on equilibration procedures, see \citet{LeachBook}, section 7.4 and \citet{ShellNotes}, lectures on Molecular dynamics and Computing properties.
610-
%DLM: Should we be saying something here about how long to equilibrate? My short version is "until the properties of the system stop changing on average", but there could be a whole set of properties one might want to look at. Clearly you should look at anything which is important to you, but also perhaps things which tend to be relatively slow, such as water, etc.
611-
%JIM: I tried this out above.
612-
613-
\todo[inline, color={yellow!20}]{DLM: Note to self, I should add a bit more discussion of what equilibration \emph{means} somewhere in this section, probably along with a discussion of equilibration vs convergence. For example, equilibration means not just that temperature and pressure stop changing but that the overall properties of teh system stop changing (e.g. if temperature and pressure is constant but your protein is unfolding you are not yet equilibrated...)}
614-
\todo[inline, color={yellow!20}]{DLM: There are a lot of long paragraphs here that are perhaps too long; the above is one example. I should police to make sure the one-point-per-paragraph rule is used and shorten some of these.}
615641

616642
\begin{figure}[h]
617643
\centering
618644
\includegraphics[width=\linewidth]{Equilibration_Workflow.pdf}
619-
\caption{Common equilibration work-flows are shown.
620-
Many times, the desired state variables of a simulation, such as energy or density, are defined by their averages in an ensemble in which they may fluctuate, such as at fixed temperature or pressure.
621-
Such cases follow the first or third equilibration routes.
622-
When we wish to simulate at a known, fixed density, such a procedure is not necessary, as shown by the second route.
623-
The last route emphasizes that generally temperature and pressure should be equilibrated separately, even if the production ensemble is NPT.
624-
}
645+
\caption{Common equilibration work-flows are shown; these vary depending on the target ensemble for production simulations (right).
646+
Typically, an initial phase of equilibration at constant volume and temperature is needed to bring the system to the desired target temperature or energy.
647+
For stability reasons, this initial phase is usually needed even if the goal is to also bring the system to a target pressure.
648+
If the production ensemble is an NVE ensemble, an initial NVT simulation is usually followed by a short additional NVE equilibration before collection of production data.
649+
If the production ensemble is NVT, protocols may differ depending on whether it is necessary to allow the system to equilibrate to a particular density/volume or whether the volume is selected \emph{a priori} (second and third rows).
650+
And if production is to be NPT, it is usually equilibrated first at NVT before equilibrating to the target pressure (final row).}
625651
\label{eqworkflow}
626652
\end{figure}
627-
\todo[inline, color={yellow!20}]{DLM: Caption needs updating to make clear why you would choose different options here, especially the two different NVT options. JIM: Better?}
628-
629-
Once equilibration is complete, production data may be collected.
630-
The production simulation is that from which specific properties of the system of interest will be calculated.
631-
As mentioned above, the equilibration procedure should be selected that is appropriate for the desired production ensemble.
632-
It should be noted that ``equilibration'' within the production run may still be necessary before properties or metrics are computed from this simulation (see \citet{ShellNotes}, lecture on Computing Properties).
633-
Otherwise, if a brief simulation in the same ensemble is not performed during the equilibration step immediately prior to production, any period of the production simulation should be ignored where drift is observed in the energies, temperatures, pressures, densities, or other defining state-variables of the ensemble.
634-
This of course precedes estimation of convergence in property calculation.
635-
Data collection then falls under the category of correctly obtaining unbiased statistics and convergence, which is covered in a separate Best Practices document (\url{https://github.com/dmzuckerman/Sampling-Uncertainty}).
653+
654+
\subsubsection{Production}
655+
656+
Once equilibration is complete, we may begin collecting data for analysis, and typically this phase is called ``production''.
657+
The main difference between equilibration and production is simply that in the production simulation, we plan to retain and analyze the collected data.
658+
Production must always be preceded by equilibration appropriate for the target production ensemble, and production data should never be collected immediately after a change in conditions (such as rescaling a box size, energy minimizing, or suddenly changing the temperature or pressure) except in very specific applications where this is the goal.
659+
660+
For bookkeeping purposes, sometimes practitioners choose to discard some initial production data as additional equilibration; usually this is simply to allow additional equilibration time after a change in protocol (such as a switch from NVT to NPT), and the usual considerations for equilibration apply in such cases (see \citet{ShellNotes}, lecture on Computing Properties).
661+
662+
Analysis of production is largely outside the scope of this work, but requires considerable care in computing observables and assessing the uncertainty in any computed properties.
663+
Usually, analysis involves computing expectation values of particular observables, and a key consideration is to obtain \emph{converged} estimates of these properties --- that is, estimates that are based on adequate simulation data so that they no longer depend substantially on the length of the simulation which was run or on its initial conditions.
664+
665+
A separate Best Practices document addresses these critical issues of convergence and error analysis (\url{https://github.com/dmzuckerman/Sampling-Uncertainty}).
636666
For more specific details on procedures and parameters used in production simulations, see the appropriate best practices document for the system of interest.
637-
\todo[inline, color={yellow!20}]{DLM: Possibly clarify distinction between production and equilibration (just whether data is retained, in some cases)}
667+
668+
\todo[inline, color={yellow!20}]{DLM: Probably need to add something here about how often to store data.}
638669

639670
\subsection{Thermostats}
640671
\label{sec:thermostats}

0 commit comments

Comments
 (0)