-
Notifications
You must be signed in to change notification settings - Fork 47
Adding Metropolis-Adjusted Langevin Algorithm #1617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1617 +/- ##
===========================================
- Coverage 84.37% 84.36% -0.01%
===========================================
Files 164 165 +1
Lines 14320 14395 +75
===========================================
+ Hits 12082 12145 +63
- Misses 2238 2250 +12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-authored-by: Daniel Weindl <[email protected]>
Co-authored-by: Daniel Weindl <[email protected]>
PaulJonasJost
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, looks good to me, thanks :)
|
@dilpath, this is ready to merge I think |
dilpath
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I mostly want a bit more documentation for future devs.
| def _propose_parameter(self, x: np.ndarray, beta: float): | ||
| x_new: np.ndarray = np.random.multivariate_normal(x, self._cov) | ||
| return x_new, np.nan # no gradient needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document the return value and return type? Where does a developer see that _propose_parameter returns a 2-tuple where the first entry is the parameter, the second entry something else?
| if not np.all(np.isfinite(cov)): | ||
| s = np.nanmean(np.diag(cov)) | ||
| s = 1.0 if not np.isfinite(s) or s <= 0 else s | ||
| return np.sqrt(s) * eye, s * eye |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain this a bit more in comments? What does this part achieve?
What does diag do here -- just added to reduce the mean with extra zeroes?
Why is there 1.0 if not np.isfinite(s) or s <= 0 else s -- isn't it already established that there is an infinity inside cov due to not np.all(np.isfinite(cov)), and therefore s is also necessarily infinite or nan?
| L: | ||
| Cholesky decomposition of the regularized covariance matrix. | ||
| cov: | ||
| Regularized estimate of the covariance matrix of the sample. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think returning the covariance first makes more sense, unless you think users/devs might be more interested in the Cholesky decomposition? This method used to only return cov, so having that as the first return value is probably safest?
| # we symmetrize cov first | ||
| cov = 0.5 * (cov + cov.T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary if the input is already a covariance matrix?
| # scale for the initial jitter | ||
| s = np.mean(np.diag(cov)) | ||
| s = 1.0 if not np.isfinite(s) or s <= 0 else s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would s be non-finite here, if the non-finite cov case was handled earlier?
| diffusion = np.sqrt(self.step_size) * precond_noise | ||
|
|
||
| x_new: np.ndarray = ( | ||
| x + self._dimension_of_target_weight * drift + diffusion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this formula with dimension_of_target_weight come from?
| def _compute_transition_log_prob( | ||
| self, x_from: np.ndarray, x_to: np.ndarray, x_from_grad: np.ndarray | ||
| ): | ||
| """Compute the log probability of transitioning from x_from to x_to with preconditioning.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which equation in which reference provides the math in this method?
| # log acceptance probability (temper log posterior) | ||
| log_p_acc = min(beta * (lpost_new - lpost), 0) | ||
|
|
||
| # This accounts for an asymmetric proposal distribution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference?
| if problem.objective.has_grad: | ||
| pytest.skip( | ||
| "EMCEE fails to converge when started in optimal point with gradient." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is strange -- does it also fail to converge if it ever visits the optimal point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a test for MalaSampler to assert that the correct posterior/density is computed?
This pull request introduces a new gradient-based sampler, the Metropolis-Adjusted Langevin Algorithm (MALA), to the
pypesto.samplemodule, and refactors the covariance regularization in the adaptive Metropolis sampler. The changes also enhance the test suite and documentation to demonstrate and validate the new functionality.I refactored the covariance regularization to return both the Cholesky decomposition and the regularized covariance matrix, improving numerical stability. The regularization function now supports multiple attempts and safe fallback strategies. Before it could happen that adaptive MCMC just fails at some point due to an ill-conditioned proposal covariance.