@@ -451,76 +451,239 @@ sampling different alchemical ranges would have different references. Therefore,
451451
452452Weight combination
453453==================
454+
455+ Basic idea
456+ ----------
454457As mentioned above, to leverage the stastics of the states collected from multiple replicas, we recommend
455- combining the alchemical weights of these states across replicas to initialize the next iteration. Below
456- we first describe how we shift weights to deal with the issue of different references of alchemical weights
457- in GROMACS LOG files, then, we describe weight-combining methods available in our current implementation.
458+ combining the alchemical weights of these states across replicas to initialize the next iteration. Ideally,
459+ well-modified weights should facilitate the convergence of the alchemical weights in expanded ensemble, which
460+ in the limit of inifinite simulation time correspond to dimensionless free energies of the alchemical states.
461+ The modified weights also directly influence the the accpetance ratio, hence the convergence of the simulation
462+ ensemble. Potentially, there are various methods for combining weights across multiple replicas. One intuitive
463+ method to average the probabilities :math: `p_1 ` and :math: `p_1 ` that respectively correspond to weights :math: `g_1 `
464+ and :math: `g_1 `, i.e.
458465
459- Weight shifting
460- ---------------
461- In the log file of a GROMACS expanded ensemble simulation, the alchemical weight of the first alchemical intermediate state
462- is always shifted to 0 to serve as the reference. In EEXE, different replicas have different ranges of alchemical
463- states, i.e. different first states, hence difference references. For example, there could be 3 replicas having
464- the following weights:
466+ .. math ::
467+ g=\ln p = -\ln \left (\frac {p_1 +p_2 }{2 }\right ) = -\ln \left (\frac {\text {e}^{-g_1 } + \text {e}^{-g_2 }}{2 }\right )
468+
469+ This exploits the fact that in expanded ensemble, the alchemical weight of a state is the dimensionless free energy
470+ of that state given an exactly flat histogram of state visitation. While this assumption of flat histograms is generally
471+ not true, espeically in cases where the free energy differen of interest is large, one can consider "correcting"
472+ the weights before combining them. (See :ref: `doc_histogram ` for more details.)
473+
474+ While the method illustrated above is intuitive and easy to operate, it suffers from the issue of reference state selection.
475+ This issue comes from the fact that GROMACS always shifts the weight of the first state to 0 to make it as the reference state.
476+ Given that the first state of different replicas in EEXE are different, this essentially means that the vector of
477+ alchemical weights of different replicas have different references. Although it is possible to pick a reference
478+ state for all replicas before weight combination could solve this issue, different choices of references could lead to
479+ slightly different combined weights, hence probability ratios. As there is no real justification which state should be favored
480+ as the reference, instead of the method explained above, we implemented another method that exploits the average of "probability ratios"
481+ to circumvent the issue of reference selection.
482+
483+ Weight combinination based on probability ratios
484+ ------------------------------------------------
485+ Generally, weight combination is performed after the final configurations have beeen figured out and it is just for
486+ the initialization of the MDP files for the next iteration. Now, to demonstrate the method implemented in
487+ :code: `ensemble_md ` (or more specifically, :obj: `.combine_weights `, here we consider the following sets of weights
488+ as an example, with :code: `X ` denoting a state not present in the alchemical range:
465489
466490::
467491
468- State 0 1 2 3 4 5 6 7 8 9
469- Rep 1 0.0 2.1 4.0 3.7 4.8 6.5 X X X X
470- Rep 2 X X 0.0 -0.4 0 .7 2.3 2.8 3.9 X X
471- Rep 3 X X X X 0.0 1.5 2.1 3.3 6.0 9.2
492+ State 0 1 2 3 4 5
493+ Rep 1 0.0 2.1 4.0 3.7 X X
494+ Rep 2 X 0.0 1 .7 1.2 2.6 X
495+ Rep 3 X X 0.0 -0.4 0.9 1.9
472496
473- Each of these replicas sample 6 alchemical states. There alchemical ranges are different but overlapping. Specifically,
474- Replicas 1 and 2 have overlap at states 2 to 5 and replicas 2 and 3 have overlap at states 4 to 7. Notably, all
475- three replicas sampled states 4 and 5. Therefore, what we are going to do is
497+ As shown above, the three replicas sample different but overlapping states. Now, our goal
498+ is to
476499
477- * For states 2 and 3 , combine weights across replicas 1 and 2.
478- * For states 4 and 5 , combine weights across replicas 1, 2 and 3 .
479- * For states 6 and 7 , combine weights across replicas 2 and 3.
500+ * For state 1 , combine the weights arcoss replicas 1 and 2.
501+ * For states 2 and 3 , combine the weights across all three replicas .
502+ * For state 4 , combine the weights across replicas 1 and 2.
480503
481- However, before we combine the weights, we should make sure the weights of all replicas have the same reference because now
482- the references of the 3 replicas are states 0, 2, and 4, respectively. Therefore could be
504+ That is, we combine weights arcoss all replicas that sample the state of interest regardless
505+ which replicas are swapping. The outcome of the whole process should be three vectors of modified
506+ alchemical weights, one for each replica, that should be specified in the MDP files for the next iteration.
507+ Below we elaborate the details of each step carried out by our method.
483508
509+ Step 1: Convert the weights into probabilities
510+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
511+ For weight :math: `g_ij` that corresponds to state :math: `j` in replica :math: `i`, we can calculate its
512+ corresopnding probability as follows:
484513
485- Exponential averaging
486- ---------------------
487- In the limit that all alchemical states are equally sampled, the alchemical weight of a state
488- is equal to the dimensionless free energy of that state, i.e. in the units of kT, we have
489- :math: `g(\lambda )=f(\lambda )=-\ln p(\lambda )`, or :math: `p(\lambda )=\exp (-g(\lambda ))`. Given this,
490- one intuitive way is to average the probability of the two simulations. Let :math: `g` be the weight combined from
491- from the weights :math: `g_1 ` and :math: `g_2 ` and :math: `p`, :math: `p_1 `, :math: `p_2 ` be the corresponding
492- probabilities, then we have
514+ .. math ::
515+ p_{ij}=\frac {\exp (-g_{ij})}{\sum _{j=1 }^N \exp (-g_{ij})}
516+
517+ where :math: `N` is the number of states in replica :math: `i`. As a result, we have the following probabilities
518+ for each replica. Note that the sum of the probabilities of each row (replica) should be 1.
519+
520+ ::
521+
522+ State 0 1 2 3 4 5
523+ Rep 1 0.858 0.105 0.016 0.021 X X
524+ Rep 2 X 0.642 0.117 0.193 0.048 X
525+ Rep 3 X X 0.328 0.489 0.133 0.049
526+
527+
528+ Step 2: Calculate the probability ratios between each pair of replicas
529+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
530+ Ideally (in the limit of inifinite simulation time), for the 3 states overlapped between replicas 1 and 2,
531+ we should have
493532
494533.. math ::
495- p = \frac {p_ 1 +p_ 2 }{ 2 } = \frac {\text {e}^{-g_ 1 } + \text {e}^{-g_ 2 }}{ 2 }
534+ r_{ 2 , 1 } = \frac {p_{ 2 i}}{p_{ 1 i}} = \frac {p_{ 21 }}{p_{ 11 }} = \frac {p_{ 22 }}{p_{ 12 }}= \frac {p_{ 23 }}{p_{ 13 }}
496535
497- Given that :math: `p=\exp (-g)`, we have
536+ where :math: `r_{2 , 1 }` is the "probability ratio" between replicas 2 and 1. However, the probability ratios
537+ corresopnding to different states will not be the same in practice, but will diverge with statistical noise
538+ for short timescales. For example, in our case we have
498539
499540.. math ::
500- g = - \ln p = - \ln \left ( \ frac {\text {e}^{-g_ 1 } + \text {e}^{-g_ 1 }}{2 } \right )
541+ \frac {p_{ 21 }}{p_{ 11 }}= 6.1143 , \; \ frac {p_{ 22 }}{p_{ 12 }} = 7.3125 , \; \frac {p_{ 23 }}{p_{ 13 }}= 9.1905
501542
502- Exponential averaging with histogram corrections
503- ------------------------------------------------
504- During the simulation, the histogram of the state visitation is generally not flat, such that
505- :math: `g` is no longer equal to the dimensionless free energy, i.e. :math: `g(\lambda )=-\ln p(\lambda )`
506- is no longer true. However, the ratio of counts is equal to the ratio of probabilities, whose natural
507- logarithm is equal to the free energy difference of the states of interest. With this, we can do
508- the following corrections before combining the weights:
543+ Similarly, for states 2 to 4, we need to calculate the probability ratios between replicas 2 and 3:
509544
510545.. math ::
511- g_k'=g_k + \ln \left ( \frac {N_{k- 1 }}{N_k} \right )
546+ \frac {p_{ 32 }}{p_{ 22 }}= 2.8034 , \; \frac {p_{ 33 }}{p_{ 23 }} = 2.5337 , \; \frac {p_{ 34 }}{p_{ 24 }}= 2.7708
512547
513- where :math`g_k'` is the corrected alchemical weight and :math: `N_{k-1 }` and :math: `N_k` are the histogram
514- counts of states :math: `k-1 ` and :math: `k`. Combining this correction with exponential averaging, we have
548+ Notably, in this case, there is no need to calculate :math: `r_{3 , 1 }` because :math: `r_{3 , 1 }` is already determined
549+ as :math: `r_{3 , 1 }=r_{3 , 2 } \times r_{2 , 1 }`. Also, there are only 2 overlapping states between replicas 1 and 3,
550+ but we want to maximize the overlap when combining weights. Therefore, the rule of thumb of calculating the
551+ probability ratios is that we only calculate the ones betwee adjacent replicas, i.e. :math: `r_{i+1 , i}`.
552+
553+
554+ Step 3: Average the probability ratios
555+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
556+ Now, to determine an unifying probability ratio between a pair of replicas, we can choose to take simple averages
557+ or geometric averages.
558+
559+ - Method 1: Simple average
560+
561+ .. math ::
562+ r_{2 , 1 } = \frac {1 }{3 }\left (\frac {p_{21 }}{p_{11 }} + \frac {p_{22 }}{p_{12 }} + \frac {p_{23 }}{p_{13 }} \right ) \approx 7.5391 , \;
563+ r_{3 , 2 } = \frac {1 }{3 }\left (\frac {p_{32 }}{p_{22 }} + \frac {p_{33 }}{p_{23 }} + \frac {p_{34 }}{p_{24 }} \right ) \approx 2.7026
564+
565+ - Method 2: Geometric average
566+
567+ .. math ::
568+ r_{2 , 1 }' = \left (\frac {p_{21 }}{p_{11 }} \times \frac {p_{22 }}{p_{12 }} \times \frac {p_{23 }}{p_{13 }} \right )^{\frac {1 }{3 }} \approx 7.4345 , \;
569+ r_{3 , 2 }' = \left (\frac {p_{32 }}{p_{22 }} \times \frac {p_{33 }}{p_{23 }} \times \frac {p_{34 }}{p_{24 }} \right )^{\frac {1 }{3 }} \approx 2.6999
570+
571+ Notably, if we take the negative natural logarithm of a probability ratio, we will get a free energy difference. For example,
572+ :math: `-\ln (p_{21 }/p_{11 })=f_{21 }-f_{11 }`, i.e. the free energy difference between state 1 in replica 2 and state 1 in replica 1.
573+ (This value is generally not 0 because different replicas have different references.) Therefore, while Method 1 takes the simple
574+ average the probability ratios, Method 2 essentially averages such free energy differences. Both methods are valid in theory and
575+ should not make a big difference in the convergence speed of the simulations because we just need an estimate of free energy for each
576+ state better than the weight of a single simulation. In fact, the closer the probability ratios are from each other, the closer
577+ the simple average is from the geometric average.
578+
579+ Step 4: Scale the probabilities for each replica
580+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
581+ Using the simple averages of the probability ratios :math: `r_{21 }` and :math: `r_{32 }`, we can scale the probability vectors of
582+ replicas 2 and 3 as follows:
583+
584+ ::
585+
586+ State 0 1 2 3 4 5
587+ Rep 1 0.858 0.105 0.016 0.021 X X
588+ Rep 2’ X 0.08529 0.01552 0.02560 0.00637 X
589+ Rep 3’ X X 0.01610 0.02400 0.00653 0.00240
590+
591+ As shown above, we keep the probability vector of replica 1 the same but scale that for the other two. Specifically, the probability
592+ vector of replica 2' is that of replica 2 divided by :math: `r_21 ` and the probability vector of replica 3' is that of replica 3
593+ divided by :math: `r_{21 } \times r_{32 }`.
594+
595+ Similarly, if we used the probability ratios :math: `r_{21 }'` and :math: `r_{32 }'`, we would have had:
596+
597+ ::
598+
599+ State 0 1 2 3 4 5
600+ Rep 1 0.858 0.105 0.016 0.021 X X
601+ Rep 2’ X 0.08645 0.01573 0.02596 0.00646 X
602+ Rep 3’ X X 0.01634 0.02436 0.00663 0.00244
603+
604+
605+ Step 5: Average and convert the probabilities
606+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
607+ After we have the scaled probabilities, we need to average them for each state, with the averaging method we used
608+ to average the probability ratios. For example, for state 1, we need to calculate the simple average of 0.105 and 0.08529,
609+ or the geometric average of 0.105 nad 0.08529. As such, for the first case (where the probabilities were scaled with :math: `r_{21 }`
610+ and :math: `r_{32 }`), we have the following probability vector of full range:
611+
612+ ::
613+
614+ Final p 0.858 0.9515 0.01587 0.02353 0.00645 0.00240
615+
616+ which can be converted to the following vector of alchemical weights (denoted as :math: `\vec {g}`) by taking the negative natural logarithm:
617+
618+ ::
619+
620+ Final g 0.1532 0.0497 4.1433 3.7495 5.0437 6.0322
621+
622+ For the second case (scaled with :math: `r_{21 }'` and :math: `r_{32 }'`), we have
623+
624+ ::
625+
626+ Final p 0.858 0.09527 0.01602 0.02368 0.00654 0.00244
627+
628+ which can be converted to the following vector of alchemical weights (denoted as :math: `\vec {g'}`):
629+
630+ ::
631+
632+ Final g 0.1532 2.3510 4.1339 3.7431 5.0437 6.0158
633+
634+
635+ Step 6: Determine the vector of alchemical weights for each replica
636+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
637+ Lastly, with the the vector of alchemical weights of the full range, we can figure out the alchemical weights
638+ for each replica, by shifting the weight of the first state of each replica back to 0. That is, with :math: `\vec {g}`,
639+ we have:
640+
641+ ::
642+
643+ State 0 1 2 3 4 5
644+ Rep 1 0.0 2.199 3.990 3.596 X X
645+ Rep 2 X 0.0 1.791 1.397 2.691 X
646+ Rep 3 X X 0.0 -0.393 0.900 1.889
647+
648+ Similarly, with :math: `\vec {g'}`, we have:
649+
650+ ::
651+
652+ State 0 1 2 3 4 5
653+ Rep 1 0.0 2.198 3.981 3.590 X X
654+ Rep 2 X 0.0 1.783 1.392 2.693 X
655+ Rep 3 X X 0.0 -0.391 0.910 1.882
656+
657+
658+ As shown above, the results using Method 1 and Method 2 are pretty close to each other. Notably, regardless of
659+ which type of averages we took, in the case here we were assuming that each replica is equally weighted. In the
660+ future, we might want to assign different weights to different replicas such that the uncertainty of free energies
661+ can be minimized. For example, if we are combining probabilities :math: `p_1 ` and :math: `p_2 ` that respectively
662+ have uncertainties :math: `\sigma _1 ` and :math: `\sigma _2 `, we can have
515663
516664.. math ::
517- g = - \ln \ left (\frac {\text {e}^{-g_ 1 '} + \text {e}^{-g_ 2 '}}{ 2 } \ right )
665+ p = \ left (\frac {\sigma _ 2 }{ \sigma _ 1 + \sigma _ 2 } \right )p_ 1 + \left ( \frac { \sigma _ 1 }{ \sigma _ 1 + \sigma _ 2 } \ right )p_ 2
518666
519- where :math: `g_1 '` and :math: `g_2 '` are weights corrected based on their histogram counts.
667+ However, calculating the uncertainties of the :math: `p_1 ` and :math: `p_2 ` on-the-fly is generally difficult, so
668+ this method has not been implemented.
669+
670+ .. _doc_histogram :
671+
672+ Histogram corrections
673+ ---------------------
674+ In the weight-combining method shown above, we frequently exploted the relationship :math: `g(\lambda )=f(\lambda )=-\ln p(\lambda )`.
675+ However, this relationship is true only when the histogram of state vistation is exactly flat, which rarely happens in reality.
676+ To correct this deviation, we can convert the difference in the histogram counts into the difference in free energies. This is based
677+ on the fact that the ratio of histogram counts is equal to the ratio of probabilities, whose natural
678+ logarithm is equal to the free energy difference of the states of interest. Specifically, we have:
679+
680+ .. math ::
681+ g_k'=g_k + \ln \left (\frac {N_{k-1 }}{N_k}\right )
682+
683+ where :math`g_k'` is the corrected alchemical weight and :math: `N_{k-1 }` and :math: `N_k` are the histogram
684+ counts of states :math: `k-1 ` and :math: `k`.
520685
521- Notably, this histogram correction should be carried out before shifting the weights, so the workflow we be
522- first correcting the weights, shifting the weights, and finally combining the weights. Also, this correction
523- method can be overcorrect the weights when the histogram counts :math: `N_k` or :math: `N_{k-1 }` are too low.
686+ Notably, this correction method can possibly overcorrect the weights when the histogram counts :math: `N_k` or :math: `N_{k-1 }` are too low.
524687To deal with this, the user can choose to specify :code: `N_cutoff ` in the input YAML file, so that the the histogram
525- correction will performed only when :math: `\text {argmin}(N_k, N_{k-1 })` is larger than the cutoff, otherwise this method
526- will reduce to the standard exponential averaging method.
688+ correction will performed only when :math: `\text {argmin}(N_k, N_{k-1 })` is larger than the cutoff. Also, this histogram correction
689+ should always be carried out before weight combination. This method is implemented in :obj: ` .histogram_correction `.
0 commit comments