TuringLang
diff --git a/‎pr-previews/599/search.json‎
Lines changed: 3 additions & 3 deletions b/‎pr-previews/599/search.json‎
Lines changed: 3 additions & 3 deletions
@@ -52,7 +52,7 @@
     "href": "usage/automatic-differentiation/index.html",
     "title": "Automatic Differentiation",
     "section": "",
-    "text": "Turing currently supports four automatic differentiation (AD) backends for sampling: ForwardDiff for forward-mode AD; and Mooncake and ReverseDiff for reverse-mode AD. ForwardDiff is automatically imported by Turing. To utilize Mooncake, or ReverseDiff for AD, users must explicitly import them with import Mooncake, alongside using Turing.\nAs of Turing version v0.30, the global configuration flag for the AD backend has been removed in favour of AdTypes.jl, allowing users to specify the AD backend for individual samplers independently. Users can pass the adtype keyword argument to the sampler constructor to select the desired AD backend, with the default being AutoForwardDiff(; chunksize=0).\nFor ForwardDiff, pass adtype=AutoForwardDiff(; chunksize) to the sampler constructor. A chunksize of nothing permits the chunk size to be automatically determined. For more information regarding the selection of chunksize, please refer to related section of ForwardDiff’s documentation.\nFor ReverseDiff, pass adtype=AutoReverseDiff() to the sampler constructor. An additional keyword argument called compile can be provided to AutoReverseDiff. It specifies whether to pre-record the tape only once and reuse it later (compile is set to false by default, which means no pre-recording). This can substantially improve performance, but risks silently incorrect results if not used with care.\nPre-recorded tapes should only be used if you are absolutely certain that the sequence of operations performed in your code does not change between different executions of your model.\nThus, e.g., in the model definition and all implicitly and explicitly called functions in the model, all loops should be of fixed size, and if-statements should consistently execute the same branches. For instance, if-statements with conditions that can be determined at compile time or conditions that depend only on fixed properties of the model, e.g. fixed data. However, if-statements that depend on the model parameters can take different branches during sampling; hence, the compiled tape might be incorrect. Thus you must not use compiled tapes when your model makes decisions based on the model parameters, and you should be careful if you compute functions of parameters that those functions do not have branching which might cause them to execute different code for different values of the parameter.\nAnd the previously used interface functions including ADBackend, setadbackend, setsafe, setchunksize, and setrdcache are deprecated and removed.",
+    "text": "Turing currently supports four automatic differentiation (AD) backends for sampling: ForwardDiff for forward-mode AD; and Mooncake and ReverseDiff for reverse-mode AD. ForwardDiff is automatically imported by Turing. To utilize Mooncake or ReverseDiff for AD, users must explicitly import them with import Mooncake or import ReverseDiff, alongside the usual using Turing.\nAs of Turing version v0.30, the global configuration flag for the AD backend has been removed in favour of AdTypes.jl, allowing users to specify the AD backend for individual samplers independently. Users can pass the adtype keyword argument to the sampler constructor to select the desired AD backend, with the default being AutoForwardDiff(; chunksize=0).\nFor ForwardDiff, pass adtype=AutoForwardDiff(; chunksize) to the sampler constructor. A chunksize of nothing permits the chunk size to be automatically determined. For more information regarding the selection of chunksize, please refer to related section of ForwardDiff’s documentation.\nFor ReverseDiff, pass adtype=AutoReverseDiff() to the sampler constructor. An additional keyword argument called compile can be provided to AutoReverseDiff. It specifies whether to pre-record the tape only once and reuse it later (compile is set to false by default, which means no pre-recording). This can substantially improve performance, but risks silently incorrect results if not used with care.\nPre-recorded tapes should only be used if you are absolutely certain that the sequence of operations performed in your code does not change between different executions of your model.\nThus, e.g., in the model definition and all implicitly and explicitly called functions in the model, all loops should be of fixed size, and if-statements should consistently execute the same branches. For instance, if-statements with conditions that can be determined at compile time or conditions that depend only on fixed properties of the model, e.g. fixed data. However, if-statements that depend on the model parameters can take different branches during sampling; hence, the compiled tape might be incorrect. Thus you must not use compiled tapes when your model makes decisions based on the model parameters, and you should be careful if you compute functions of parameters that those functions do not have branching which might cause them to execute different code for different values of the parameter.\nAnd the previously used interface functions including ADBackend, setadbackend, setsafe, setchunksize, and setrdcache are deprecated and removed.",
     "crumbs": [
       "Get Started",
       "User Guide",
@@ -64,7 +64,7 @@
     "href": "usage/automatic-differentiation/index.html#switching-ad-modes",
     "title": "Automatic Differentiation",
     "section": "",
-    "text": "Turing currently supports four automatic differentiation (AD) backends for sampling: ForwardDiff for forward-mode AD; and Mooncake and ReverseDiff for reverse-mode AD. ForwardDiff is automatically imported by Turing. To utilize Mooncake, or ReverseDiff for AD, users must explicitly import them with import Mooncake, alongside using Turing.\nAs of Turing version v0.30, the global configuration flag for the AD backend has been removed in favour of AdTypes.jl, allowing users to specify the AD backend for individual samplers independently. Users can pass the adtype keyword argument to the sampler constructor to select the desired AD backend, with the default being AutoForwardDiff(; chunksize=0).\nFor ForwardDiff, pass adtype=AutoForwardDiff(; chunksize) to the sampler constructor. A chunksize of nothing permits the chunk size to be automatically determined. For more information regarding the selection of chunksize, please refer to related section of ForwardDiff’s documentation.\nFor ReverseDiff, pass adtype=AutoReverseDiff() to the sampler constructor. An additional keyword argument called compile can be provided to AutoReverseDiff. It specifies whether to pre-record the tape only once and reuse it later (compile is set to false by default, which means no pre-recording). This can substantially improve performance, but risks silently incorrect results if not used with care.\nPre-recorded tapes should only be used if you are absolutely certain that the sequence of operations performed in your code does not change between different executions of your model.\nThus, e.g., in the model definition and all implicitly and explicitly called functions in the model, all loops should be of fixed size, and if-statements should consistently execute the same branches. For instance, if-statements with conditions that can be determined at compile time or conditions that depend only on fixed properties of the model, e.g. fixed data. However, if-statements that depend on the model parameters can take different branches during sampling; hence, the compiled tape might be incorrect. Thus you must not use compiled tapes when your model makes decisions based on the model parameters, and you should be careful if you compute functions of parameters that those functions do not have branching which might cause them to execute different code for different values of the parameter.\nAnd the previously used interface functions including ADBackend, setadbackend, setsafe, setchunksize, and setrdcache are deprecated and removed.",
+    "text": "Turing currently supports four automatic differentiation (AD) backends for sampling: ForwardDiff for forward-mode AD; and Mooncake and ReverseDiff for reverse-mode AD. ForwardDiff is automatically imported by Turing. To utilize Mooncake or ReverseDiff for AD, users must explicitly import them with import Mooncake or import ReverseDiff, alongside the usual using Turing.\nAs of Turing version v0.30, the global configuration flag for the AD backend has been removed in favour of AdTypes.jl, allowing users to specify the AD backend for individual samplers independently. Users can pass the adtype keyword argument to the sampler constructor to select the desired AD backend, with the default being AutoForwardDiff(; chunksize=0).\nFor ForwardDiff, pass adtype=AutoForwardDiff(; chunksize) to the sampler constructor. A chunksize of nothing permits the chunk size to be automatically determined. For more information regarding the selection of chunksize, please refer to related section of ForwardDiff’s documentation.\nFor ReverseDiff, pass adtype=AutoReverseDiff() to the sampler constructor. An additional keyword argument called compile can be provided to AutoReverseDiff. It specifies whether to pre-record the tape only once and reuse it later (compile is set to false by default, which means no pre-recording). This can substantially improve performance, but risks silently incorrect results if not used with care.\nPre-recorded tapes should only be used if you are absolutely certain that the sequence of operations performed in your code does not change between different executions of your model.\nThus, e.g., in the model definition and all implicitly and explicitly called functions in the model, all loops should be of fixed size, and if-statements should consistently execute the same branches. For instance, if-statements with conditions that can be determined at compile time or conditions that depend only on fixed properties of the model, e.g. fixed data. However, if-statements that depend on the model parameters can take different branches during sampling; hence, the compiled tape might be incorrect. Thus you must not use compiled tapes when your model makes decisions based on the model parameters, and you should be careful if you compute functions of parameters that those functions do not have branching which might cause them to execute different code for different values of the parameter.\nAnd the previously used interface functions including ADBackend, setadbackend, setsafe, setchunksize, and setrdcache are deprecated and removed.",
     "crumbs": [
       "Get Started",
       "User Guide",
@@ -76,7 +76,7 @@
     "href": "usage/automatic-differentiation/index.html#compositional-sampling-with-differing-ad-modes",
     "title": "Automatic Differentiation",
     "section": "Compositional Sampling with Differing AD Modes",
-    "text": "Compositional Sampling with Differing AD Modes\nTuring supports intermixed automatic differentiation methods for different variable spaces. The snippet below shows using ForwardDiff to sample the mean (m) parameter, and using ReverseDiff for the variance (s) parameter:\n\nusing Turing\nusing ReverseDiff\n\n# Define a simple Normal model with unknown mean and variance.\n@model function gdemo(x, y)\n    s² ~ InverseGamma(2, 3)\n    m ~ Normal(0, sqrt(s²))\n    x ~ Normal(m, sqrt(s²))\n    return y ~ Normal(m, sqrt(s²))\nend\n\n# Sample using Gibbs and varying autodiff backends.\nc = sample(\n    gdemo(1.5, 2),\n    Gibbs(\n        :m =&gt; HMC(0.1, 5; adtype=AutoForwardDiff(; chunksize=0)),\n        :s² =&gt; HMC(0.1, 5; adtype=AutoReverseDiff(false)),\n    ),\n    1000,\n    progress=false,\n)\n\n\nChains MCMC chain (1000×3×1 Array{Float64, 3}):\n\nIterations        = 1:1:1000\nNumber of chains  = 1\nSamples per chain = 1000\nWall duration     = 17.11 seconds\nCompute duration  = 17.11 seconds\nparameters        = s², m\ninternals         = lp\n\nSummary Statistics\n  parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   e ⋯\n      Symbol   Float64   Float64   Float64    Float64    Float64   Float64     ⋯\n\n          s²    1.8990    1.1552    0.0741   249.3430   392.4103    1.0104     ⋯\n           m    1.1080    0.7247    0.0677   114.5861   185.1771    1.0011     ⋯\n                                                                1 column omitted\n\nQuantiles\n  parameters      2.5%     25.0%     50.0%     75.0%     97.5%\n      Symbol   Float64   Float64   Float64   Float64   Float64\n\n          s²    0.5761    1.0734    1.5711    2.4327    4.8515\n           m   -0.2071    0.6275    1.0585    1.5525    2.6180\n\n\n\n\nGenerally, reverse-mode AD, for instance ReverseDiff, is faster when sampling from variables of high dimensionality (greater than 20), while forward-mode AD, for instance ForwardDiff, is more efficient for lower-dimension variables. This functionality allows those who are performance sensitive to fine tune their automatic differentiation for their specific models.\nIf the differentiation method is not specified in this way, Turing will default to using whatever the global AD backend is. Currently, this defaults to ForwardDiff.\nThe most reliable way to ensure you are using the fastest AD that works for your problem is to benchmark them using the functionality in DynamicPPL (see the API documentation):\n\nusing DynamicPPL.TestUtils.AD: run_ad, ADResult\nusing ForwardDiff, ReverseDiff\n\nmodel = gdemo(1.5, 2)\n\nfor adtype in [AutoForwardDiff(), AutoReverseDiff()]\n    result = run_ad(model, adtype; benchmark=true)\n    @show result.time_vs_primal\nend\n\n\n[ Info: Running AD on gdemo with ADTypes.AutoForwardDiff()\n       params : [-0.1732688205035601, 0.28254056143823814]\n       actual : (-6.203334146710871, [2.7501839961955326, 3.154170469114926])\n     expected : (-6.203334146710871, [2.7501839961955326, 3.154170469114926])\ngrad / primal : 1.3975705272430197\nresult.time_vs_primal = 1.3975705272430197\n[ Info: Running AD on gdemo with ADTypes.AutoReverseDiff()\n       params : [-0.438530314411812, 0.5227586718827104]\n       actual : (-6.319895979615482, [3.795161057779026, 2.994996834844374])\n     expected : (-6.319895979615482, [3.795161057779026, 2.994996834844374])\ngrad / primal : 34.998809321313146\nresult.time_vs_primal = 34.998809321313146\n\n\n\n\nIn this specific instance, ForwardDiff is clearly faster (due to the small size of the model).\nWe also have a table of benchmarks for various models and AD backends in the ADTests website. These models aim to capture a variety of different Turing.jl features. If you have suggestions for things to include, please do let us know by creating an issue on GitHub!",
+    "text": "Compositional Sampling with Differing AD Modes\nTuring supports intermixed automatic differentiation methods for different variable spaces. The snippet below shows using ForwardDiff to sample the mean (m) parameter, and using ReverseDiff for the variance (s) parameter:\n\nusing Turing\nusing ReverseDiff\n\n# Define a simple Normal model with unknown mean and variance.\n@model function gdemo(x, y)\n    s² ~ InverseGamma(2, 3)\n    m ~ Normal(0, sqrt(s²))\n    x ~ Normal(m, sqrt(s²))\n    return y ~ Normal(m, sqrt(s²))\nend\n\n# Sample using Gibbs and varying autodiff backends.\nc = sample(\n    gdemo(1.5, 2),\n    Gibbs(\n        :m =&gt; HMC(0.1, 5; adtype=AutoForwardDiff(; chunksize=0)),\n        :s² =&gt; HMC(0.1, 5; adtype=AutoReverseDiff(false)),\n    ),\n    1000,\n    progress=false,\n)\n\n\nChains MCMC chain (1000×3×1 Array{Float64, 3}):\n\nIterations        = 1:1:1000\nNumber of chains  = 1\nSamples per chain = 1000\nWall duration     = 16.79 seconds\nCompute duration  = 16.79 seconds\nparameters        = s², m\ninternals         = lp\n\nSummary Statistics\n  parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   e ⋯\n      Symbol   Float64   Float64   Float64    Float64    Float64   Float64     ⋯\n\n          s²    2.1795    1.9481    0.1886   138.7328   201.7629    1.0104     ⋯\n           m    1.0902    0.8457    0.1052    72.3319    75.9211    1.0191     ⋯\n                                                                1 column omitted\n\nQuantiles\n  parameters      2.5%     25.0%     50.0%     75.0%     97.5%\n      Symbol   Float64   Float64   Float64   Float64   Float64\n\n          s²    0.5523    1.0995    1.6323    2.6228    6.9458\n           m   -0.9230    0.6372    1.1493    1.6290    2.5831\n\n\n\n\nGenerally, reverse-mode AD, for instance ReverseDiff, is faster when sampling from variables of high dimensionality (greater than 20), while forward-mode AD, for instance ForwardDiff, is more efficient for lower-dimension variables. This functionality allows those who are performance sensitive to fine tune their automatic differentiation for their specific models.\nIf the differentiation method is not specified in this way, Turing will default to using whatever the global AD backend is. Currently, this defaults to ForwardDiff.\nThe most reliable way to ensure you are using the fastest AD that works for your problem is to benchmark them using the functionality in DynamicPPL (see the API documentation):\n\nusing DynamicPPL.TestUtils.AD: run_ad, ADResult\nusing ForwardDiff, ReverseDiff\n\nmodel = gdemo(1.5, 2)\n\nfor adtype in [AutoForwardDiff(), AutoReverseDiff()]\n    result = run_ad(model, adtype; benchmark=true)\n    @show result.time_vs_primal\nend\n\n\n[ Info: Running AD on gdemo with ADTypes.AutoForwardDiff()\n       params : [0.7995082135821117, 1.411830070363656]\n       actual : (-5.234062605719208, [-1.6238071640959817, -0.33063961876490167])\n     expected : (-5.234062605719208, [-1.6238071640959817, -0.33063961876490167])\ngrad / primal : 1.3278902364218197\nresult.time_vs_primal = 1.3278902364218197\n[ Info: Running AD on gdemo with ADTypes.AutoReverseDiff()\n       params : [1.3082649210428334, 0.860431856053291]\n       actual : (-6.280218188045741, [-2.3583000578819737, 0.2483153547863173])\n     expected : (-6.280218188045741, [-2.358300057881974, 0.24831535478631733])\ngrad / primal : 35.2478323699422\nresult.time_vs_primal = 35.2478323699422\n\n\n\n\nIn this specific instance, ForwardDiff is clearly faster (due to the small size of the model).\nWe also have a table of benchmarks for various models and AD backends in the ADTests website. These models aim to capture a variety of different Turing.jl features. If you have suggestions for things to include, please do let us know by creating an issue on GitHub!",
     "crumbs": [
       "Get Started",
       "User Guide",