Add documentation for Stochastic Gradient Samplers #629

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

AoifeHughes wants to merge 22 commits into main from add-stoc-docs

Contributor

AoifeHughes commented Aug 4, 2025

Docs on SGHMC / SGLD? Turing.jl#2270 adds docs to support this


          Add documentation for Stochastic Gradient Samplers

AoifeHughes self-assigned this


          Implement code changes to enhance functionality and improve performance

4b3f7d1

Contributor

github-actions bot commented Aug 4, 2025

Preview the changes: https://turinglang.org/docs/pr-previews/629
Please avoid using the search feature and navigation bar in PR previews!

AoifeHughes requested a review from mhauru

August 7, 2025 07:43


          Merge branch 'main' into add-stoc-docs

18ad4b0

mhauru requested changes

View reviewed changes

Member

mhauru left a comment

I can comment on the clarity of explanations, but I can't comment on some of the content, most importantly the Summary section, because I know nothing about these samplers. E.g. the recommendations for hyperparameters, I have no idea about them. @yebai, who would be a good reviewer for that?

usage/stochastic-gradient-samplers/index.qmd Outdated

+              # Define a simple Gaussian model
+              @model function gaussian_model(x)
+                  μ ~ Normal(0, 10)
+                  σ ~ truncated(Normal(0, 5), 0, Inf)

Member

mhauru Aug 7, 2025

Suggested change

      
                σ ~ truncated(Normal(0, 5), 0, Inf)
          
                σ ~ truncated(Normal(0, 5); lower=0)

The Inf version causes trouble with AD, see JuliaStats/Distributions.jl#1910. We are trying to guide users towards the kwargs lower and upper.

usage/stochastic-gradient-samplers/index.qmd Outdated

+              ```{julia}
+              #| output: false
+              setprogress!(false)

Member

mhauru Aug 7, 2025

This needs to be moved up, or replaced with progress=false in the sample call. Currently the above cell still produces loads of lines of progress output that don't render nicely: https://turinglang.org/docs/pr-previews/629/usage/stochastic-gradient-samplers/

usage/stochastic-gradient-samplers/index.qmd Outdated

+              ```
+              ```{julia}
+              plot(chain_sgld)

Member

mhauru Aug 7, 2025

The results on https://turinglang.org/docs/pr-previews/629/usage/stochastic-gradient-samplers/ don't look convincing to me, it looks like sampling hasn't converged. Can we increase sample counts without it taking too long? Or it could be a problem with some hyperparameters, I wouldn't know.

usage/stochastic-gradient-samplers/index.qmd Outdated

+              ```
+              ```{julia}
+              plot(chain_sghmc)

Member

mhauru Aug 7, 2025

Same thing for these results.

usage/stochastic-gradient-samplers/index.qmd Outdated

+              summarystats(chain_hmc)
+              ```
+              Compare the trace plots:

Member

mhauru Aug 7, 2025

Could we comment on the conclusions from this, what do we learn from this comparison? Also, the first trace plot looks weird.

usage/stochastic-gradient-samplers/index.qmd Outdated


		### When to Use Stochastic Gradient Samplers

		- Large datasets: When full gradient computation is prohibitively expensive

Member

mhauru Aug 7, 2025

Isn't this in contradiction with the statement below that with Turing full gradients are computed anyway, and noise is added?

usage/stochastic-gradient-samplers/index.qmd Outdated

+              Pkg.instantiate();
+              ```
+              Turing.jl provides stochastic gradient-based MCMC samplers that are designed for large-scale datasets where computing full gradients is computationally expensive. The two main stochastic gradient samplers are **Stochastic Gradient Langevin Dynamics (SGLD)** and **Stochastic Gradient Hamiltonian Monte Carlo (SGHMC)**.

Member

mhauru Aug 7, 2025

The first sentence seems to be immediately undermined by the next paragraph that says that you can't actually use them for this purpose. Maybe better to lead with what they are currently useful for and then comment on possible future uses on if we ever get to implementing these better, rather than the other way around.

usage/stochastic-gradient-samplers/index.qmd

		@@ -0,0 +1,219 @@
		---

Member

mhauru Aug 7, 2025

This is a general comment, not related to the line it's attached to: The navigation bar on the left needs a new link to this page, I think currently there's no way to navigate to it without knowing the URL.

usage/stochastic-gradient-samplers/index.qmd Outdated

+              model = gaussian_model(data)
+              ```
+              SGLD requires very small step sizes to ensure stability. We use a `PolynomialStepsize` that decreases over time:

Member

mhauru Aug 7, 2025

Do we have other options for stepsize in Turing, other than PolynomialStepsize?

usage/stochastic-gradient-samplers/index.qmd Outdated


		## Automatic Differentiation Backends

		Both samplers support different AD backends:

Member

mhauru Aug 7, 2025

This could link to the AD page in our docs for more information.

AoifeHughes and others added 4 commits

August 12, 2025 11:18


          Add Stochastic Gradient Samplers documentation and enhance existing c…

64c3870

…ontent


          bump versions

b8a2b6a


          mani regen

869f9d0


          Merge branch 'main' into add-stoc-docs

3ae881b

AoifeHughes requested a review from mhauru

August 18, 2025 09:18

penelopeysm and others added 11 commits

August 19, 2025 16:27


          fix typo

b32f6d5


          Add Stochastic Gradient Samplers documentation and enhance existing c…

c0a6f03

…ontent


          bump versions

855a012


          mani regen

3968c13


          Color-Theme update to match main site for consistency (#613)

4cded52

* Updated theme-colors to match main site for consistency

* fixed search results color in dark mode

* fix copy button css in dark mode

* search bar background udpdate

* removed current default footer and added custom one

* Add custom footer and update styles to match TuringLang/turinglang.github.io#119

* Update styles to match original site

* cleanup code

* Added SCSS styles to match main site

* Add all icons in navbar + match few tweaks with main PR

* Enable Open Graph and Twitter Cards for SEO

* fix corrupted png

* remove old styles

---------

Co-authored-by: Penelope Yong <[email protected]>


          fix urls for open graph (#630)

f562afa


          Remove MicroCanonicalHMC.jl and update external sampler docs (#628)

dbf2a97

* Fix external sampler docs

* Remove MCHMC as a dep

* update

* Explain docs in more detail

* Bump to 0.39.9


          Added redirects (#632)

5a0f806


          Update README.md

8793cfb


          Merge branch 'add-stoc-docs' of https://github.com/TuringLang/docs in…

9dce424

…to add-stoc-docs


          Merge branch 'main' into add-stoc-docs

3e3aa73

yebai self-requested a review

August 20, 2025 10:29

AoifeHughes and others added 2 commits

August 21, 2025 14:31


          updated messages and notices

c64f29f


          Merge branch 'main' into add-stoc-docs

9b269a7

Contributor Author

AoifeHughes commented Sep 1, 2025

https://turinglang.org/docs/pr-previews/629/usage/stochastic-gradient-samplers/ - renders okay at least. Looking into the convergence things atm

Aoife added 2 commits

September 1, 2025 10:40


          bumped chain length

e083a4b


          tried to tweak sampling and updated explainations

339da9d

Contributor Author

AoifeHughes commented Sep 15, 2025

At the extent of my knowledge on these, some visual things I dont understand in the final figure, not sure why it's not converging properly. Happy to make changes if someone can direct what is needed

Member

mhauru commented Sep 15, 2025

I don't know why HMC is having such trouble with this quite simple model, the numerical integration errors just blow up quite often. Seems that you can fix that with a decent initial value for the chain though. Try adding the keyword argument initial_params=[0.0, 1.0] to the sample call for HMC, that seems to help.

We should set the same initial_params for the two other samplers as well. Fixing the starting point of the MCMC chain also makes the comparison between the different samplers a bit fairer, so not a bad thing in general. However, it doesn't solve all the convergence issues with at least SGLD (I didn't look at the other one). However, those seem to be fixable by tuning the parameters of PolynomialStepsize. I don't really understand PolynomialStepsize, but at least for SGLD PolynomialStepsize(0.01, 100) seems to work decently. Try adding in that, and if necessary tune the parameters for the other sampler as well.

Member

yebai commented Sep 16, 2025 •

edited

Loading

Thanks @AoifeHughes and @mhauru.

Quick comments:

The example models in this tutorial do not use stochastic gradients. They are using noiseless gradients for SGLD, SGHMC and standard HMC algorithms. This leads to incorrect SGLD and SGHMC results in specific cases since their validity depends on suitable gradient noise (intuitively, lack of gradient noise means momentum never gets refreshed).
SGLD and SGHMC are examples of HMC-family algorithms, so it is sufficient to mention these algorithms in available MCMC algorithms (eg, here).
Turing / DynamicPPL doesn't yet support minibatch yet.

In conclusion, it might be better if we close this PR and revisit in the future.

Member

mhauru commented Sep 16, 2025

I wish this had come up earlier so @AoifeHughes could have avoided putting in effort trying to make the examples in this PR work.

I would still advocate for merging a version of this, even if it only explains that we have implementations of SGLD and SGHMC in principle included in the codebase, but that they don't actually do anything useful at the moment because we can't compute stochastic gradients / don't have minibatching. Hence this is only useful for research needs until it gets improved in the future. Some poor user might otherwise go through the same exercise that Aoife and I have been through here, of slowly understanding that there is little point to our current SGLD/SGHMC implementations, and waste time.

Member

yebai commented Sep 16, 2025 •

edited

Loading

Sorry for not being able to provide feedback earlier.

~~@AoifeHughes could have asked questions in my office hour with her~~. However, hopefully, this is still a useful learning journey, as it involves many features of Turing.jl.

EDIT: I could have misremembered a conversation with Aoife, in which she asked me a question about this issue.

Member

penelopeysm commented Sep 17, 2025

Maybe we could put some of these conclusions in the Turing docstrings?

AoifeHughes closed this

github-actions bot added a commit that referenced this pull request


          Remove preview for merged PR #629

c61abc8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet