Conversation
|
Turing.jl documentation for PR #2733 is available at: |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2733 +/- ##
==========================================
- Coverage 87.03% 86.32% -0.72%
==========================================
Files 20 22 +2
Lines 1296 1433 +137
==========================================
+ Hits 1128 1237 +109
- Misses 168 196 +28 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Closes #2232 Closes #2363 Closes #2745 Closes #2732 Closes #1775 Closes #2735 Closes #2734 Closes #2634 This PR redesigns Turing's optimisation interface as discussed in #2634. # How to review? The key things that are changed here are: - The fields of the `ModeResult` struct are slightly different. We get rid of the old `NamedArray` that was stored in there. Instead, users who really, really need it can generate it themselves via `StatsBase.coef(result)`. The primary interface for getting parameters will be `result.params`, which is a `Dict{VarName}` (obviously can be changed to VNT later). - Linking is now user-controlled and depends on the `link` parameter passed in. - Constraints are now passed in as either a NamedTuple or a Dict{VarName} (again pending VNT). Values are always provided in unlinked space. - Initial parameters are now supplied as an `AbstractInitStrategy`. The code in `src/optimisation/init.jl` will attempt to sample initial parameters using that strategy, but will also make sure that the sampled value obeys the constraints. The most meaningful section of code is the definition of `estimate_mode`, plus `src/optimisation/init.jl` which handles everything to do with initialisation and constraints. The rest of the stuff are just tweaks to fit the new interface.
The usual game of trying to see if DPPL breaks anything upstream. (20 minutes later) Okay, maybe '*if* DPPL breaks anything upstream' was a bit optimistic. More like seeing *what* DPPL broke upstream.
|
This isn't yet ready for release; DPPL isn't released yet, and the MH bit needs a review. But tests are passing. |
sunxd3
left a comment
There was a problem hiding this comment.
the logjac is tricky stuff, I think it's correct, but I guess only tests and time will tell
thanks for this, Penny!
| function DynamicPPL.init( | ||
| rng::Random.AbstractRNG, vn::VarName, prior::Distribution, strategy::InitFromProposals | ||
| ) | ||
| if haskey(strategy.proposals, vn) |
There was a problem hiding this comment.
haskey feels a bit dangerous, what if proposals have x but vn is x[1]?
also if this doesn't match, it might fail silently to the else branch?
There was a problem hiding this comment.
It would silently do the else branch, that's true. I need to sit down and work out these edge cases. (To be fair, the old one had even weirder behaviour, but that's not an excuse.)
There was a problem hiding this comment.
I think it's very hard to do the correct thing though. In the else branch we could kind of do a 'stepping upwards' loop, like if vn is x[1].a we check whether x[1] exists or x exists and if so warn. But I think that would just create other edge cases, and it would also be bad for performance.
Maybe the best answer is before we start sampling (somewhere in the first AbstractMCMC.step) we should do one model evaluation where we iterate through the model and print out the proposal for each parameter? The user should be able to silence that with verbose=false, and if not then it's a good check for them.
| vns_with_proposal::Set{VarName} | ||
| end | ||
| function (s::StoreUnspecifiedPriors)(val, tval, logjac, vn, dist::Distribution) | ||
| return if vn in s.vns_with_proposal |
There was a problem hiding this comment.
similar to the question above, what to do in the case of subsumption?
| # Convert all keys to VarNames. | ||
| vn, proposal = pair | ||
| vn = _to_varname(vn) | ||
| if !haskey(raw_vals, vn) |
There was a problem hiding this comment.
I think the behavior is correct -- if user doesn't give init info, then we skip.
But it also makes me think what if user gives the wrong key for initialization, then this is kind of silent.
| ) | ||
|
|
||
| # Calculate the log-acceptance probability. | ||
| log_a = ( |
There was a problem hiding this comment.
maybe some NaN/Inf handling? (can lp gets these values?)
There was a problem hiding this comment.
Depends. Sometimes -Inf is fine in the sense that you just make it reject the sample (which can happen either by sampling a parameter value outside the support, or if the user just wants to signal some failure condition with @addlogprob! -Inf). The main problem I discovered with this was when the initial parameters already gave an lp of -Inf, which is why I inserted the special handler on line 286 onwards.
NaN on the other hand I think can only happen if the value itself is already a NaN or if there's a bug in the implementation of logpdf, which should probably not happen. But we could insert a warning here? I think the current behaviour is that it will never be accepted (because comparisons against NaN always return false), which is fairly sensible, but I could see a warning being nice.
src/mcmc/mh.jl
Outdated
|
|
||
| proposals = NamedTuple{tuple(prop_syms...)}(tuple(props...)) | ||
| **Note that when using conditional proposals, the values obtained by indexing into the | ||
| `VarNamedTuple` are always in unlinked space.** Sometimes, you may want to define a random-walk |
There was a problem hiding this comment.
side question: do we have a convention on the use of link vs transform vs constrained?
There was a problem hiding this comment.
Unfortunately not. I like transform the best out of them, so I'll try to stick to that where possible in the docstrings, but a lot of the code still uses link.
My perspective on this is that transform is generic and can be anything, but link is the special transform that maps to unconstrained space. That might be a bit questionable because in theory there are many transforms that will map to unconstrained space, so in practice what it really means is whatever Bijectors.bijector returns.
|
This is pretty much ready to go once CI passes. Confusingly, there's a case where CI on 1.10 hangs midway through the MH tests. I have no idea why. |
|
Oh, I thought I would finally go to bed, but actually I can reproduce it locally just by running Ctrl-C'ing at the place where it hangs gives the following stack trace: |
mhauru
left a comment
There was a problem hiding this comment.
Very excited to see this go live!
| However, please note that this only applies to **containers that contain random variables on the left-hand side of tilde-statements.** | ||
| In general, there are no restrictions on containers of *observed* data, or containers that are not used in tilde-statements. | ||
|
|
||
| - Likewise, arrays of random variables should ideally have a constant size from iteration to iteration. That means a model like this will fail sometimes (*but* see below): |
There was a problem hiding this comment.
Would it be worth pointing out either here or above at the beginning of these bullet points that this only applies when using indexing, and doing the multivariate distribution version of the below is entirely fine?
HISTORY.md
Outdated
| Specifically, the types of **containers that can include random variables** are now more limited: | ||
| if `x[i] ~ dist` is a random variable, then `x` must obey the following criteria: | ||
|
|
||
| - They must be arrays. Dicts and other containers are currently unsupported (we have [an issue to track this](https://github.com/TuringLang/DynamicPPL.jl/issues/1263)). If you really need this functionality, please open an issue and let us know; we can try to make it a priority. |
There was a problem hiding this comment.
Worth specifying what we mean by "arrays"? Any subtype of AbstractArray?
There was a problem hiding this comment.
Yeah, changed to AbstractArray.
There was a problem hiding this comment.
Technically, it's contingent on it implementing enough of an AbstractArray interface, plus it working with BangBang. But well.
| These two convenience functions are now imported and re-exported from DynamicPPL, rather than DistributionsAD.jl. | ||
| They are now just wrappers around `Distributions.product_distribution`, instead of the specialised implementations that were in DistributionsAD.jl. | ||
| DistributionsAD.jl is for all intents and purposes deprecated: it is no longer a dependency in the Turing stack. | ||
|
|
There was a problem hiding this comment.
Sounds like at least Gibbs may have had a serious performance boost. Is it worth talking about performance improvements here?
There was a problem hiding this comment.
Added a note. Turns out runtimes are indeed a lot better too, this is 0.11 seconds on this branch vs 0.18 seconds on main.
using Turing
@model function f()
x ~ Normal(0, 1)
y = zeros(10)
for i in 1:10
y[i] ~ Normal(x, 1)
end
z ~ Normal(sum(y), 1)
end
@time sample(f(), Gibbs(:x => HMC(0.1, 10), :y => HMC(0.1, 10), :z => HMC(0.1, 10)), 1000; chain_type=Any);| These two convenience functions are now imported and re-exported from DynamicPPL, rather than DistributionsAD.jl. | ||
| They are now just wrappers around `Distributions.product_distribution`, instead of the specialised implementations that were in DistributionsAD.jl. | ||
| DistributionsAD.jl is for all intents and purposes deprecated: it is no longer a dependency in the Turing stack. | ||
|
|
There was a problem hiding this comment.
Any other advantages other than performance that would be worth raising here? Improvements to fix and conditional?
There was a problem hiding this comment.
I added a paragraph about this. I'm still not entirely sure I like where we are. We still don't have well-defined semantics for this. It's probably much closer to what we think should be correct, but there isn't a formal statement of what is correct, and consequently it's hard to meaningfully judge how much closer we are to that. Of course, there are lots of individual cases where we can say the behaviour is more intuitive, but lots of individual cases don't together make a formal specification.
There was a problem hiding this comment.
Still, that's something for another time.
|
Looks great! One minor comment: If we drop deps on DistributionsAD, would we lose any rules for ForwardDiff (default Turing AD) in TuringLang/DistributionsAD.jl#280? |
|
@yebai I doubt it. At the very least, since CI is passing, we can be sure that there's nothing in Turing's CI suite that depended on that. The CI suite does contain this list of distributions that we check against: Turing.jl/test/stdlib/distributions.jl Lines 53 to 156 in 184d592 I think this list is probably outdated and could be expanded. Bijectors now contains very though testing for differentiability of with_logabsdet_jacobian on a ton of distributions, and the remaining thing would be to test differentiability of logpdf. (That should arguably be in Distributions.jl, but well... we might have to stick it in DynamicPPL or something.) But I'm pretty sure that anything that used to work before should continue to work. |
|
CI failure is due to a Julia GC segfault -- really?!.... Edit: It is indeterministic on CI and I couldn't reproduce locally, so will just ignore it going forwards |
This minor version does the following:
Things that NEED to be fixed before releasing:
TracedModel--get around this using a tactic based on VectorParamAccumulatorI benchmarked and it isn't a problem.LKJCholesky(3, x)where it generates 6 names but 9 values) -- the Turing code for this has already been written, and generally works, but requires upstream changesVectorBijectors.optic_vecto be merged & released Vector bijectors Bijectors.jl#429