🎶 Benefits of Connecting Vensim to Stan #21

hyunjimoon · 2022-09-24T12:58:17Z

hyunjimoon
Sep 24, 2022
Maintainer

Vensim is the standard tool for system dynamic modelers. However, for Bayes + System dynamics, I am persuading my advisor Hazhir (MIT, system dynamics group) and Tom (Vensim CTO) to consider outsourcing optimization (or at least offerring ways to do optimization) to Stan e.g. by connecting Stan math library to Vensim. Stan is not the only option; gen which offers custom hybrid inference algorithms and I believe there could be other alternatives. I gave the following for reasons for this outsource.

To whom?

Stan

Benefits: increase usability

ISO defines usability = efficiency + effectiveness + satisfaction

efficiency

T test: convergence diagnostics

sbc, prior predictive, posterior predictive
Convergence Diagnostics
Speedup
Analysis ecosystems for easy modeling and plot like arviz, posteriordb, brms

effectiveness

Aapproximator:

auto-diff based HMC sampling scales much better with higher parameter dimension as it finds geodesics e.g. How would HMC's performance compare with DiffeRential Evolution Adaptive Metropolis, MCMC alg. we mainly use, as parameter dimension increases?
built-in probability functions that include density and mass function: likelihood and prior distribution (gamma ln)
one disadvantage for Stan is we need to truncate only`neg_binom_2_lpmf(int | real )'

if then else ( DataFlowOverTime[Rgn] 
                  > 0, 
             GAMMA LN ( Di[Rgn] 
                       + 1
                            / alp[Rgn] ) 
                  - GAMMA LN ( 1
                            / alp[Rgn] ) 
                  - GAMMA LN ( Di[Rgn] 
                            + 1) 
                  - ( Di[Rgn] 
                       + 1
                            / alp[Rgn] ) 
                       * ln ( 1
                                 + alp[Rgn] 
                                      * Mu[Rgn] ) 
                  + Di[Rgn] 
                       * ( ln ( alp[Rgn] ) 
                            + ln ( Mu[Rgn] ) ) , 
             0)

for scaling purpose (due to heavy computation from factorial in neg_binom), Jair had at once considered using lognormal
when variance ~ mean^2 vs variance ~ mean (stan uses standard-deviation parameterization)

satisfaction

Qualitative Why?

Considering log likelihood optimization is the core for Bayesian computation (one branch of probabilistic programming language), wouldn't it be natural for Stan platform to have continuous inflow of cutting-edge techniques both for algorithm and diagnostics? Note both algorithm and diagnostics need constant update considering their actor-critic relation. For one, Vensim's MCMC, DREAM algorithm explained in this paper cites Gelman and Rubin (1992) for its convergence diagnostics which is outdated. khat from this paper is one of the cutting-edge diagnoses. Stan have a team of at least ten developers concentrating on developing diagnostics (and thousands of users for testing) itself so perhaps outsourcing this and channeling our efforts in what we excel at seemed reasonable to me. To be specific, the decision analysis of Stan in this official manual is not dynamic which we can do better. Practicality of decision-based optimization is what I feel SD experts can contribute to Bayes + System dynamics.

Quantitative Why?

To get gradient of payoff with regard to parameters.

This part corresponds to Stan's function block in Stan estimation, which offers autodiff for implicit functions. Johann from Stanford Computational Policy Lab. who coded reverse-mode autodiff for algebraic solvers explains this here which states that Stan has Powell, Newton, fixed point iteration algebraic solver and only the former two have reverse-mode autodiff implemented. Reverse-mode has a high speedup benefit when output dimension is much smaller than input which is the case when log_posterior is our optimization target. As proof, Johann's Propagating Derivatives through Implicit Functions in Reverse Mode Autodiff poster Propagating Derivatives through Implicit Functions in Reverse Mode Autodiff shows the speedup.
That being said, Juho's paper An importance sampling approach for reliable and eﬃcient inference in Bayesian ordinary diﬀerential equation models suggesting the following workflow is cutting-edge in Stan community on ODE which I wish to follow. Considering importance sampling is sequential Monte Carlo's core, exploration on particle MCMC (pMCMC) interests me.

Select a reasonable approximation method $M$.
Sample parameter draws $\boldsymbol{\theta}_s^{\prime}, s=1, \ldots, S$, using MCMC with $M$ as the approximation method.
Compute ${MAE}^{M, M^}$ importance weights $r_s^{M, M^}$ using approximation method $M^*$. Fit $\hat{k}$ as explained in Section $2.4$.
Increase the accuracy of $M^* $ and repeat Step 3 until ${MAE}^{M, M^*}$ and $\hat{k}$ converge. If $\hat{k}$ converges to a value larger than $0.7$, increase the accuracy of $M$ and go back to Step 2.
Compute any posterior estimates using Eq. 8, with $r_s$ being the values to which $r_s^{M, M^*}$ finally converged.

Mathematic details of the above is in hyunjimoon/SBC#52 (comment)

Ryan helped me connect importance sampling with MCMC by pointing to his paper he coauthored with Tamara Broderick Covariances, Robustness, and Variational Bayes: "sampling with MCMC samples to calculate the local sensitivity is precisely equivalent to using the same MCMC samples to estimate the covariance, importance sampling approach is equivalent to using MCMC samples to estimate" (appendix B).

At this point (09/22) Stan seems to be the best choice for a gradient-based optimizer (Vensim's version is not).

What is done

Code-wise, I finished auto-translating Vensim dynamic models to Stan function block with @Dashadower commit 684dfb1 and issue #17 last week. Draft for Estimation for four parameters of Lotka-Volterra model defined from Vensim and inference in Stan is completed but with bad fit $\hat{k} > .7$. Notebook file includes prior, posterior predictive checks.

What is needed

persuasion on why gradient-based may be better than non-gradient (especially in high dimension). I wish to get @betanalpha's help on this considering our context (dynamic modeling) is small portion of Bayesian model (though with growing importance); so I had difficulty judging the HMC's speed outperformance. I also wanted to ask how HMC (or gradient-based MCMC) can be developed for sequential Bayesian upating.

What can be done

connection to gen

tomfid · 2022-09-26T14:46:36Z

tomfid
Sep 26, 2022
Maintainer

I could potentially be convinced otherwise, but I have yet to encounter a gradient solver that was general purpose for all SD models. Payoff discontinuities, roughness, and weird manifolds within high-dimensional parameter spaces are pretty common.

6 replies

hyunjimoon Sep 27, 2022
Maintainer Author

HMC tends to scale well even with banana. You can experiment here: https://chi-feng.github.io/mcmc-demo/app.html Perhaps @hazhirr might also be interested as he wanted comparison of HMC vs other MCMC (W7 topic of our seminar which covers @betanalpha's paper).

hyunjimoon Sep 27, 2022
Maintainer Author

So, to use HMC, smoothing discontinuity to continuous quantities using additional mechanism like flushing (defining stock variable that shadows flow variable) seems to be one good approach from our search, may I ask your opinion? @tomfid

HMC itself is a dynamic system operating with a hamiltonian mechanics on symplectic manifold which explains its good performance (especially in high dimension). @bgoodri's comment of HMC is a dynamic system after he verified my thoughts "any generative time-series can be considered a system dynamic model outcome from a Bayesian perspective" gave food for thoughts on the importance of mechanic..

tomfid Sep 27, 2022
Maintainer

The banana probably isn't a comprehensive example. I think the key question is really whether it's also possible to handle quantized parameters (like a switch taking integer values) or very coarsely quantized variables like the number of factories in a production system.

I wouldn't want to define a reporting variable that accumulates a flow, effectively smoothing it over some time, unless the actual data gathering system also did that.

hyunjimoon Sep 27, 2022
Maintainer Author

Having to marginalize for discrete type variable (I think model choice p(M1) can be one example) is downside problem of gradient based optimization
.. no free lunch :)

hyunjimoon Dec 27, 2022
Maintainer Author

Here is Bob's reasoning on his belief gradient is pretty much the only guide we have https://discourse.mc-stan.org/t/hints-an-alternative-to-nuts-that-doesnt-require-gradients/10017/4

hyunjimoon · 2022-10-09T19:38:45Z

hyunjimoon
Oct 9, 2022
Maintainer Author

Andrew pointed me to the paper Delivering data differently.

2.5 Sound

Attempts to use sonification for data

4.4 Using musical sounds to convey the progress of iterative algorithms

We can also use pitch, volume, rhythm, timbre, and the progressions of musical expectation
to convey what is happening within the fits and starts of an iterative algorithm. For example,
Hamiltonian Monte Carlo can get stuck when its steps are too small, too large, or misaligned
to the local geometry of the space being explored; within an algorithm such as the no-U-turn
sampler, these problems can appear as divergences or max treedepths. Instead of expressing these
as discrete warnings on the screen, these could be conveyed, for example, through pitch, with
well-behaving trajectories sounding like calm music, gradually turning into annoying high-pitched
buzzing when the algorithm is getting stuck. This sonic summary could be tied to a dynamic visualization 
so that the user could then take a look at where in the parameter space this is happening.

A related application is the training of neural networks, where progress is tracked using improvement 
of some objective function during the training process, and, again, sound can be used to
convey progress of the algorithm from different starting points, with unusual or unexpected sound
patterns indicating a lack of smooth progress. The sonic of musical output can serve to reassure
that the algorithm is progressing well or alert users of problems, and also to signal when and where
to perform further exploration, perhaps using visualizations, to diagnose and fix problems.

0 replies

hyunjimoon · 2022-10-26T12:28:07Z

hyunjimoon
Oct 26, 2022
Maintainer Author

Discussion with @jandraor on benefits of Stan connection

`T` test: convergence diagnostics

sbc, prior predictive, posterior predictive

`A`approximator:

auto-diff based HMC sampling scales much better with higher parameter dimension as it finds geodesics e.g. How would HMC's performance compare with DiffeRential Evolution Adaptive Metropolis, MCMC alg. we mainly use, as parameter dimension increases?
built-in probability functions that include density and mass function: likelihood and prior distribution (gamma ln)
one disadvantage for Stan is we need to truncate only`neg_binom_2_lpmf(int | real )'

if then else ( DataFlowOverTime[Rgn] 
                  > 0, 
             GAMMA LN ( Di[Rgn] 
                       + 1
                            / alp[Rgn] ) 
                  - GAMMA LN ( 1
                            / alp[Rgn] ) 
                  - GAMMA LN ( Di[Rgn] 
                            + 1) 
                  - ( Di[Rgn] 
                       + 1
                            / alp[Rgn] ) 
                       * ln ( 1
                                 + alp[Rgn] 
                                      * Mu[Rgn] ) 
                  + Di[Rgn] 
                       * ( ln ( alp[Rgn] ) 
                            + ln ( Mu[Rgn] ) ) , 
             0)

for scaling purpose (due to heavy computation from factorial in neg_binom), Jair had at once considered using lognormal
when variance ~ mean^2 vs variance ~ mean (stan uses standard-deviation parameterization)

0 replies

🎶 Benefits of Connecting Vensim to Stan #21

Uh oh!

Uh oh!

hyunjimoon Sep 24, 2022 Maintainer

To whom?

Benefits: increase usability

efficiency

effectiveness

satisfaction

Qualitative Why?

Quantitative Why?

What is done

What is needed

What can be done

Replies: 3 comments · 6 replies

Uh oh!

tomfid Sep 26, 2022 Maintainer

Uh oh!

Uh oh!

hyunjimoon Sep 27, 2022 Maintainer Author

Uh oh!

Uh oh!

hyunjimoon Sep 27, 2022 Maintainer Author

Uh oh!

tomfid Sep 27, 2022 Maintainer

Uh oh!

hyunjimoon Sep 27, 2022 Maintainer Author

Uh oh!

hyunjimoon Dec 27, 2022 Maintainer Author

Uh oh!

Uh oh!

hyunjimoon Oct 9, 2022 Maintainer Author

2.5 Sound

4.4 Using musical sounds to convey the progress of iterative algorithms

Uh oh!

Uh oh!

hyunjimoon Oct 26, 2022 Maintainer Author

T test: convergence diagnostics

Aapproximator:

hyunjimoon
Sep 24, 2022
Maintainer

Replies: 3 comments 6 replies

tomfid
Sep 26, 2022
Maintainer

hyunjimoon Sep 27, 2022
Maintainer Author

hyunjimoon Sep 27, 2022
Maintainer Author

tomfid Sep 27, 2022
Maintainer

hyunjimoon Sep 27, 2022
Maintainer Author

hyunjimoon Dec 27, 2022
Maintainer Author

hyunjimoon
Oct 9, 2022
Maintainer Author

hyunjimoon
Oct 26, 2022
Maintainer Author

`T` test: convergence diagnostics

`A`approximator: