TuringLang · AoifeHughes · Oct 3, 2025 · Oct 3, 2025 · Oct 3, 2025 · Oct 6, 2025
diff --git a/.gitignore b/.gitignore
@@ -28,4 +28,5 @@ venv
 site_libs
 .DS_Store
 index_files
-digest.txt
+digest.txt
+*.bak
diff --git a/README.md b/README.md
@@ -72,7 +72,7 @@ If you wish to speed up local rendering, there are two options available:
    ```
    quarto render path/to/index.qmd
    ```
-
+   
    (Note that `quarto preview` does not support this single-file rendering.)
 
 2. Download the most recent `_freeze` folder from the [GitHub releases of this repo](https://github.com/turinglang/docs/releases), and place it in the root of the project.
@@ -82,8 +82,7 @@ If you wish to speed up local rendering, there are two options available:
    Note that the validity of a `_freeze` folder depends on the Julia environment that it was created with, because different package versions may lead to different outputs.
    In the GitHub release, the `Manifest.toml` is also provided, and you should also download this and place it in the root directory of the docs.
 
-   If there isn't a suitably up-to-date `_freeze` folder in the releases, you can generate a new one by [triggering a run for the `create_release.yml` workflow](https://github.com/TuringLang/docs/actions/workflows/create_release.yml).
-   (You will need to have the appropriate permissions; please create an issue if you need help with this.)
+   If there isn't a suitably up-to-date `_freeze` folder in the releases, you can generate a new one by [triggering a run for the `create_release.yml` workflow](https://github.com/TuringLang/docs/actions/workflows/create_release.yml) (You will need to have the appropriate permissions; please create an issue if you need help with this).
-   If there isn't a suitably up-to-date `_freeze` folder in the releases, you can generate a new one by [triggering a run for the `create_release.yml` workflow](https://github.com/TuringLang/docs/actions/workflows/create_release.yml) (You will need to have the appropriate permissions; please create an issue if you need help with this).
+   If there isn't a suitably up-to-date `_freeze` folder in the releases, you can generate a new one
+   by [triggering a run for the `create_release.yml` workflow](https://github.com/TuringLang/docs/actions/workflows/create_release.yml)
+   (You will need to have the appropriate permissions; please create an issue if you need help with this).
-   If there isn't a suitably up-to-date `_freeze` folder in the releases, you can generate a new one by [triggering a run for the `create_release.yml` workflow](https://github.com/TuringLang/docs/actions/workflows/create_release.yml) (You will need to have the appropriate permissions; please create an issue if you need help with this).
+   If there isn't a suitably up-to-date `_freeze` folder in the releases, you can generate a new one
+   by [triggering a run for the `create_release.yml` workflow](https://github.com/TuringLang/docs/actions/workflows/create_release.yml)
+   (You will need to have the appropriate permissions; please create an issue if you need help with this).
 
 ## Troubleshooting build issues
 
@@ -101,6 +100,6 @@ And also, kill any stray Quarto processes that are still running (sometimes it k
 pkill -9 -f quarto
 ```
 
-## License
+## Licence
 
-This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+This project is licensed under the MIT Licence - see the [Licence](Licence) file for details.
-This project is licensed under the MIT Licence - see the [Licence](Licence) file for details.
+This project is licensed under the MIT License - see the [License](LICENSE) file for details.
-This project is licensed under the MIT Licence - see the [Licence](Licence) file for details.
+This project is licensed under the MIT License - see the [License](LICENSE) file for details.
diff --git a/core-functionality/index.qmd b/core-functionality/index.qmd
@@ -266,7 +266,7 @@ using Turing
 # Add four processes to use for sampling.
 addprocs(4; exeflags="--project=$(Base.active_project())")
 
-# Initialize everything on all the processes.
+# Initialise everything on all the processes.
 # Note: Make sure to do this after you've already loaded Turing,
 #       so each process does not have to precompile.
 #       Parallel sampling may fail silently if you do not do this.
@@ -329,7 +329,7 @@ Inputs to the model that have a value `missing` are treated as parameters, aka r
 ```{julia}
 @model function gdemo(x, ::Type{T}=Float64) where {T}
     if x === missing
-        # Initialize `x` if missing
+        # Initialise `x` if missing
         x = Vector{T}(undef, 2)
     end
     s² ~ InverseGamma(2, 3)
@@ -344,7 +344,7 @@ model = gdemo(missing)
 c = sample(model, HMC(0.05, 20), 500)
 ```
 
-Note the need to initialize `x` when missing since we are iterating over its elements later in the model.
+Note the need to initialise `x` when missing since we are iterating over its elements later in the model.
 The generated values for `x` can be extracted from the `Chains` object using `c[:x]`.
 
 Turing also supports mixed `missing` and non-`missing` values in `x`, where the missing ones will be treated as random variables to be sampled while the others get treated as observations.
@@ -376,7 +376,7 @@ using Turing
 
 @model function generative(x=missing, ::Type{T}=Float64) where {T<:Real}
     if x === missing
-        # Initialize x when missing
+        # Initialise x when missing
         x = Vector{T}(undef, 10)
     end
     s² ~ InverseGamma(2, 3)
@@ -597,10 +597,10 @@ logging is enabled as default but might slow down inference. It can be turned on
 or off by setting the keyword argument `progress` of `sample` to `true` or `false`.
 Moreover, you can enable or disable progress logging globally by calling `setprogress!(true)` or `setprogress!(false)`, respectively.
 
-Turing uses heuristics to select an appropriate visualization backend. If you
+Turing uses heuristics to select an appropriate visualisation backend. If you
 use Jupyter notebooks, the default backend is
 [ConsoleProgressMonitor.jl](https://github.com/tkf/ConsoleProgressMonitor.jl).
 In all other cases, progress logs are displayed with
 [TerminalLoggers.jl](https://github.com/c42f/TerminalLoggers.jl). Alternatively,
-if you provide a custom visualization backend, Turing uses it instead of the
+if you provide a custom visualisation backend, Turing uses it instead of the
 default backend.
diff --git a/developers/inference/abstractmcmc-interface/index.qmd b/developers/inference/abstractmcmc-interface/index.qmd
@@ -59,7 +59,7 @@ using Random
 
 An interface extension (like the one we're writing right now) typically requires that you overload or implement several functions. Specifically, you should `import` the functions you intend to overload. This next code block accomplishes that.
 
-From `Distributions`, we need `Sampleable`, `VariateForm`, and `ValueSupport`, three abstract types that define a distribution. Models in the interface are assumed to be subtypes of `Sampleable{VariateForm, ValueSupport}`. In this section our model is going be be extremely simple, so we will not end up using these except to make sure that the inference functions are dispatching correctly.
+From `Distributions`, we need `Sampleable`, `VariateForm`, and `ValueSupport`, three abstract types that define a distribution. Models in the interface are assumed to be subtypes of `Sampleable{VariateForm, ValueSupport}`. In this section our model is going be extremely simple, so we will not end up using these except to make sure that the inference functions are dispatching correctly.
 
 ### Sampler
 
@@ -79,7 +79,7 @@ function MetropolisHastings(init_θ::Vector{<:Real})
 end
 ```
 
-Above, we have defined a sampler that stores the initial parameterization of the prior, and a distribution object from which proposals are drawn. You can have a struct that has no fields, and simply use it for dispatching onto the relevant functions, or you can store a large amount of state information in your sampler.
+Above, we have defined a sampler that stores the initial parameterisation of the prior, and a distribution object from which proposals are drawn. You can have a struct that has no fields, and simply use it for dispatching onto the relevant functions, or you can store a large amount of state information in your sampler.
 
 The general intuition for what to store in your sampler struct is that anything you may need to perform inference between samples but you don't want to store in a transition should go into the sampler struct. It's the only way you can carry non-sample related state information between `step!` calls.
 
@@ -124,7 +124,7 @@ As a refresher, Metropolis-Hastings implements a very basic algorithm:
 
 2. For ``t`` in ``[1,N],`` do
 
-    + Generate a proposal parameterization ``\theta^\prime_t \sim q(\theta^\prime_t \mid \theta_{t-1}).``
+    + Generate a proposal parameterisation ``\theta^\prime_t \sim q(\theta^\prime_t \mid \theta_{t-1}).``
 
     + Calculate the acceptance probability, ``\alpha = \text{min}\left[1,\frac{\pi(\theta'_t)}{\pi(\theta_{t-1})} \frac{q(\theta_{t-1} \mid \theta'_t)}{q(\theta'_t \mid \theta_{t-1})}) \right].``
 
@@ -163,19 +163,19 @@ function AbstractMCMC.step!(
 end
 ```
 
-The first `step!` function just packages up the initial parameterization inside the sampler, and returns it. We implicitly accept the very first parameterization.
+The first `step!` function just packages up the initial parameterisation inside the sampler, and returns it. We implicitly accept the very first parameterisation.
 
 The other `step!` function performs the usual steps from Metropolis-Hastings. Included are several helper functions, `proposal` and `q`, which are designed to replicate the functions in the pseudocode above.
 
 - `proposal` generates a new proposal in the form of a `Transition`, which can be univariate if the value passed in is univariate, or it can be multivariate if the `Transition` given is multivariate. Proposals use a basic `Normal` or `MvNormal` proposal distribution.
-- `q` returns the log density of one parameterization conditional on another, according to the proposal distribution.
+- `q` returns the log density of one parameterisation conditional on another, according to the proposal distribution.
 - `step!` generates a new proposal, checks the acceptance probability, and then returns either the previous transition or the proposed transition.
 
 
 ```{julia}
 #| eval: false
 # Define a function that makes a basic proposal depending on a univariate
-# parameterization or a multivariate parameterization.
+# parameterisation or a multivariate parameterisation.
 function propose(spl::MetropolisHastings, model::DensityModel, θ::Real)
     return Transition(model, θ + rand(spl.proposal))
 end
@@ -193,7 +193,7 @@ function q(spl::MetropolisHastings, θ::Vector{<:Real}, θcond::Vector{<:Real})
 end
 q(spl::MetropolisHastings, t1::Transition, t2::Transition) = q(spl, t1.θ, t2.θ)
 
-# Calculate the density of the model given some parameterization.
+# Calculate the density of the model given some parameterisation.
 ℓπ(model::DensityModel, θ) = model.ℓπ(θ)
 ℓπ(model::DensityModel, t::Transition) = t.lp
 

diff --git a/developers/inference/variational-inference/index.qmd b/developers/inference/variational-inference/index.qmd
@@ -174,7 +174,7 @@ $$
 
 for some function $g\_{\theta}$ differentiable wrt. $\theta$. So all $q_{\theta} \in \mathscr{Q}\_{\Theta}$ are using the *same* reparameterization-function $g$ but each $q\_{\theta}$ correspond to different choices of $\theta$ for $f\_{\theta}$.
 
-Under this assumption we can differentiate the sampling process by taking the derivative of $g\_{\theta}$ wrt. $\theta$, and thus we can differentiate the entire $\widehat{\mathrm{ELBO}}(q\_{\theta})$ wrt. $\theta$! With the gradient available we can either try to solve for optimality either by setting the gradient equal to zero or maximize $\widehat{\mathrm{ELBO}}(q\_{\theta})$ stepwise by traversing $\mathscr{Q}\_{\Theta}$ in the direction of steepest ascent. For the sake of generality, we're going to go with the stepwise approach.
+Under this assumption we can differentiate the sampling process by taking the derivative of $g\_{\theta}$ wrt. $\theta$, and thus we can differentiate the entire $\widehat{\mathrm{ELBO}}(q\_{\theta})$ wrt. $\theta$! With the gradient available we can either try to solve for optimality either by setting the gradient equal to zero or maximise $\widehat{\mathrm{ELBO}}(q\_{\theta})$ stepwise by traversing $\mathscr{Q}\_{\Theta}$ in the direction of steepest ascent. For the sake of generality, we're going to go with the stepwise approach.
 
 With all this nailed down, we eventually reach the section on **Automatic Differentiation Variational Inference (ADVI)**.
 
@@ -186,7 +186,7 @@ So let's revisit the assumptions we've made at this point:
 
  2. $\mathscr{Q}\_{\Theta}$ is a space of _reparameterizable_ densities with $\bar{q}(z)$ as the base-density.
 
- 3. The parameterization function $g\_{\theta}$ is differentiable wrt. $\theta$.
+ 3. The parameterisation function $g\_{\theta}$ is differentiable wrt. $\theta$.
 
  4. Evaluation of the probability density $q\_{\theta}(z)$ is differentiable wrt. $\theta$.
 
@@ -335,7 +335,7 @@ $$
 
 #### Back to VI
 
-So why is this is useful? Well, we're looking to generalize our approach using a normal distribution to cases where the supports don't match up. How about defining $q(z)$ by
+So why is this is useful? Well, we're looking to generalise our approach using a normal distribution to cases where the supports don't match up. How about defining $q(z)$ by
 
 ::: {.column-page}
 $$

diff --git a/developers/transforms/bijectors/index.qmd b/developers/transforms/bijectors/index.qmd
@@ -260,7 +260,7 @@ println("went out of bounds $n_oob_transformed/10000 times")
 In the subsections above, we've seen two different methods of sampling from a constrained distribution:
 
 1. Sample directly from the distribution and reject any samples outside of its support.
-2. Transform the distribution to an unconstrained one and sample from that instead.
+2. Transform the distribution to an unconstrained one and sample from that instead. 
 
 (Note that both of these methods are applicable to other samplers as well, such as Hamiltonian Monte Carlo.)
 

diff --git a/faq/index.qmd b/faq/index.qmd
@@ -76,7 +76,7 @@ end
 - **Assume statements** (sampling statements): Often crash unpredictably or produce incorrect results
 - **AD backend compatibility**: Many AD backends don't support threading. Check the [multithreaded column in ADTests](https://turinglang.org/ADTests/) for compatibility
 
-For safe parallelism within models, consider vectorized operations instead of explicit threading.
+For safe parallelism within models, consider vectorised operations instead of explicit threading.
 
 ## How do I check the type stability of my Turing model?
 
@@ -140,7 +140,7 @@ The choice of AD backend can significantly impact performance. See:
 Small changes can have big performance impacts. Common culprits include:
 
 - Type instability introduced by the change
-- Switching from vectorized to scalar operations (or vice versa)
+- Switching from vectorised to scalar operations (or vice versa)
 - Inadvertently causing AD backend incompatibilities
 - Breaking assumptions that allowed compiler optimizations
 

diff --git a/tutorials/bayesian-differential-equations/index.qmd b/tutorials/bayesian-differential-equations/index.qmd
@@ -334,7 +334,7 @@ To learn more about how to optimize solving performance for stiff problems you c
 _Sensitivity analysis_ is provided by the [SciMLSensitivity.jl package](https://docs.sciml.ai/SciMLSensitivity/stable/), which forms part of SciML's differential equation suite.
 The model sensitivities are the derivatives of the solution with respect to the parameters.
 Specifically, the local sensitivity of the solution to a parameter is defined by how much the solution would change if the parameter were changed by a small amount.
-Sensitivity analysis provides a cheap way to calculate the gradient of the solution which can be used in parameter estimation and other optimization tasks.
+Sensitivity analysis provides a cheap way to calculate the gradient of the solution which can be used in parameter estimation and other optimisation tasks.
 The sensitivity analysis methods in SciMLSensitivity.jl are based on automatic differentiation (AD), and are compatible with many of Julia's AD backends.
 More details on the mathematical theory that underpins these methods can be found in [the SciMLSensitivity documentation](https://docs.sciml.ai/SciMLSensitivity/stable/sensitivity_math/).
 

diff --git a/tutorials/bayesian-linear-regression/index.qmd b/tutorials/bayesian-linear-regression/index.qmd
@@ -26,7 +26,7 @@ using Turing
 # Package for loading the data set.
 using RDatasets
 
-# Package for visualization.
+# Package for visualisation.
 using StatsPlots
 
 # Functionality for splitting the data.
@@ -35,7 +35,7 @@ using MLUtils: splitobs
 # Functionality for constructing arrays with identical elements efficiently.
 using FillArrays
 
-# Functionality for normalizing the data and evaluating the model predictions.
+# Functionality for normalising the data and evaluating the model predictions.
 using StatsBase
 
 # Functionality for working with scaled identity matrices.
@@ -69,7 +69,7 @@ first(data, 6)
 size(data)
 ```
 
-The next step is to get our data ready for testing. We'll split the `mtcars` dataset into two subsets, one for training our model and one for evaluating our model. Then, we separate the targets we want to learn (`MPG`, in this case) and standardize the datasets by subtracting each column's means and dividing by the standard deviation of that column. The resulting data is not very familiar looking, but this standardization process helps the sampler converge far easier.
+The next step is to get our data ready for testing. We'll split the `mtcars` dataset into two subsets, one for training our model and one for evaluating our model. Then, we separate the targets we want to learn (`MPG`, in this case) and standardise the datasets by subtracting each column's means and dividing by the standard deviation of that column. The resulting data is not very familiar looking, but this standardization process helps the sampler converge far easier.
 
 ```{julia}
 # Remove the model column.
@@ -85,12 +85,12 @@ test = Matrix(select(testset, Not(target)))
 train_target = trainset[:, target]
 test_target = testset[:, target]
 
-# Standardize the features.
+# Standardise the features.
 dt_features = fit(ZScoreTransform, train; dims=1)
 StatsBase.transform!(dt_features, train)
 StatsBase.transform!(dt_features, test)
 
-# Standardize the targets.
+# Standardise the targets.
 dt_targets = fit(ZScoreTransform, train_target)
 StatsBase.transform!(dt_targets, train_target)
 StatsBase.transform!(dt_targets, test_target);

diff --git a/tutorials/bayesian-logistic-regression/index.qmd b/tutorials/bayesian-logistic-regression/index.qmd
@@ -33,7 +33,7 @@ using MCMCChains, Plots, StatsPlots
 # We need a logistic function, which is provided by StatsFuns.
 using StatsFuns: logistic
 
-# Functionality for splitting and normalizing the data
+# Functionality for splitting and normalising the data
 using MLDataUtils: shuffleobs, stratifiedobs, rescale!
 
 # Set a seed for reproducibility.

diff --git a/tutorials/multinomial-logistic-regression/index.qmd b/tutorials/multinomial-logistic-regression/index.qmd
@@ -28,7 +28,7 @@ using RDatasets
 # Load StatsPlots for visualizations and diagnostics.
 using StatsPlots
 
-# Functionality for splitting and normalizing the data.
+# Functionality for splitting and normalising the data.
 using MLDataUtils: shuffleobs, splitobs, rescale!
 
 # We need a softmax function which is provided by NNlib.
@@ -84,7 +84,7 @@ test_features = Matrix(testset[!, features])
 train_target = trainset[!, target]
 test_target = testset[!, target]
 
-# Standardize the features.
+# Standardise the features.
 μ, σ = rescale!(train_features; obsdim=1)
 rescale!(test_features, μ, σ; obsdim=1);
 ```

diff --git a/tutorials/probabilistic-pca/index.qmd b/tutorials/probabilistic-pca/index.qmd
@@ -63,7 +63,7 @@ We can also express the above formula in matrix form: $\mathbf{X}_{D \times N} \
 We are interested in inferring $\mathbf{W}$, $μ$ and $\sigma$.
 
 Classical PCA is the specific case of probabilistic PCA when the covariance of the noise becomes infinitesimally small, i.e. $\sigma^2 \to 0$.
-Probabilistic PCA generalizes classical PCA, this can be seen by marginalizing out the the latent variable.[^2]
+Probabilistic PCA generalizes classical PCA, this can be seen by marginalizing out the latent variable.[^2]
 
 ## The gene expression example
 
@@ -93,7 +93,7 @@ using Turing
 using Mooncake
 using LinearAlgebra, FillArrays
 
-# Packages for visualization
+# Packages for visualisation
 using DataFrames, StatsPlots, Measures
 
 # Set a seed for reproducibility.
@@ -134,7 +134,7 @@ mat_exp[1:(n_genes ÷ 3), 1:(n_cells ÷ 2)] .+= 10
 mat_exp[(2 * (n_genes ÷ 3) + 1):end, (n_cells ÷ 2 + 1):end] .+= 10
 ```
 
-To visualize the $(D=9) \times (N=60)$ data matrix `mat_exp`, we use the `heatmap` plot.
+To visualise the $(D=9) \times (N=60)$ data matrix `mat_exp`, we use the `heatmap` plot.
 
 ```{julia}
 heatmap(