You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: usage/stochastic-gradient-samplers/index.qmd
+35-19Lines changed: 35 additions & 19 deletions
Original file line number
Diff line number
Diff line change
@@ -10,9 +10,16 @@ using Pkg;
10
10
Pkg.instantiate();
11
11
```
12
12
13
-
Turing.jl provides stochastic gradient-based MCMC samplers that are designed for large-scale datasets where computing full gradients is computationally expensive. The two main stochastic gradient samplers are **Stochastic Gradient Langevin Dynamics (SGLD)** and **Stochastic Gradient Hamiltonian Monte Carlo (SGHMC)**.
13
+
Turing.jl provides stochastic gradient-based MCMC samplers: **Stochastic Gradient Langevin Dynamics (SGLD)** and **Stochastic Gradient Hamiltonian Monte Carlo (SGHMC)**.
14
14
15
-
**Important**: The current implementation in Turing.jl computes full gradients with added stochastic noise rather than true mini-batch stochastic gradients. These samplers require very careful hyperparameter tuning and are typically most useful for research purposes or when working with streaming data.
15
+
## Current Capabilities
16
+
17
+
The current implementation in Turing.jl is primarily useful for:
**Important**: The current implementation computes full gradients with added stochastic noise rather than true mini-batch stochastic gradients. This means these samplers don't currently provide the computational benefits typically associated with stochastic gradient methods for large datasets. They require very careful hyperparameter tuning and often perform slower than standard samplers like HMC or NUTS for most practical applications.
16
23
17
24
## Setup
18
25
@@ -24,6 +31,9 @@ using Random
24
31
using LinearAlgebra
25
32
26
33
Random.seed!(123)
34
+
35
+
# Disable progress bars for cleaner output
36
+
Turing.setprogress!(false)
27
37
```
28
38
29
39
## SGLD (Stochastic Gradient Langevin Dynamics)
@@ -42,7 +52,7 @@ data = rand(Normal(true_μ, true_σ), N)
42
52
# Define a simple Gaussian model
43
53
@model function gaussian_model(x)
44
54
μ ~ Normal(0, 10)
45
-
σ ~ truncated(Normal(0, 5), 0, Inf)
55
+
σ ~ truncated(Normal(0, 5); lower=0)
46
56
47
57
for i in 1:length(x)
48
58
x[i] ~ Normal(μ, σ)
@@ -52,21 +62,17 @@ end
52
62
model = gaussian_model(data)
53
63
```
54
64
55
-
SGLD requires very small step sizes to ensure stability. We use a `PolynomialStepsize` that decreases over time:
65
+
SGLD requires very small step sizes to ensure stability. We use a `PolynomialStepsize` that decreases over time. Note: Currently, `PolynomialStepsize` is the primary stepsize schedule available in Turing for SGLD:
Compare the results to evaluate the performance of stochastic gradient samplers on a more complex model:
156
167
157
168
```{julia}
158
169
println("True β values: ", true_β)
@@ -163,9 +174,14 @@ println("SGLD estimates:")
163
174
summarystats(chain_lr_sgld)
164
175
```
165
176
177
+
The linear regression example demonstrates that stochastic gradient samplers can recover the true parameters, but:
178
+
- They require significantly longer chains (5000 vs 1000 for HMC)
179
+
- The estimates may have higher variance
180
+
- Convergence diagnostics should be carefully examined before trusting the results
181
+
166
182
## Automatic Differentiation Backends
167
183
168
-
Both samplers support different AD backends:
184
+
Both samplers support different AD backends. For more information about automatic differentiation in Turing, see the [Automatic Differentiation](../automatic-differentiation/) documentation.
0 commit comments