Replace `Metadata.flags` with `Metadata.trans` #1060

mhauru · 2025-09-29T17:03:41Z

Now that the "del" flag is gone (#1058), the only flag that is ever used is "trans". Hence, no need to bother with having the Dict{String, BitVector} for Metadata.flags, and can instead have a single BitVector for Metadata.trans. EDIT: Renamed to Metadata.is_transformed.

You may wonder, given that Metadata is presumably on its way out, why bother? Two reasons:

I tried running the benchmark suite locally with VectorVarInfo, and there were some horrendous performance regressions there compared to using Metadata. Hence, we might not be about to switch over the VarNamedVector imminently.
The above experience made me wonder why there was such a performance difference, and whether the Metadata.flags field might actually be a significant cost compared to a BitVector.

My local benchmarking suggests that indeed, this makes a difference:

Before

┌───────────────────────┬───────┬─────────────┬───────────────────┬────────┬────────────────┬─────────────────┐
│                 Model │   Dim │  AD Backend │           VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼────────────────┼─────────────────┤
│ Simple assume observe │     1 │ forwarddiff │             typed │  false │           16.0 │             1.7 │
│           Smorgasbord │   201 │ forwarddiff │             typed │  false │          790.6 │            46.1 │
│           Smorgasbord │   201 │ forwarddiff │ simple_namedtuple │   true │          382.0 │            84.3 │
│           Smorgasbord │   201 │ forwarddiff │           untyped │   true │         1431.7 │            36.0 │
│           Smorgasbord │   201 │ forwarddiff │       simple_dict │   true │        10511.1 │            21.6 │
│           Smorgasbord │   201 │ reversediff │             typed │   true │         1495.9 │            42.4 │
│           Smorgasbord │   201 │    mooncake │             typed │   true │         1637.4 │             3.4 │
│    Loop univariate 1k │  1000 │    mooncake │             typed │   true │         8635.9 │             3.2 │
│       Multivariate 1k │  1000 │    mooncake │             typed │   true │         1266.1 │             8.5 │
│   Loop univariate 10k │ 10000 │    mooncake │             typed │   true │        90116.3 │             3.2 │
│      Multivariate 10k │ 10000 │    mooncake │             typed │   true │        10364.2 │             9.7 │
│               Dynamic │    10 │    mooncake │             typed │   true │          235.0 │             5.7 │
│              Submodel │     1 │    mooncake │             typed │   true │           24.0 │             4.2 │
│                   LDA │    12 │ reversediff │             typed │   true │         1391.7 │             2.0 │
└───────────────────────┴───────┴─────────────┴───────────────────┴────────┴────────────────┴─────────────────┘

After

┌───────────────────────┬───────┬─────────────┬───────────────────┬────────┬────────────────┬─────────────────┐
│                 Model │   Dim │  AD Backend │           VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼────────────────┼─────────────────┤
│ Simple assume observe │     1 │ forwarddiff │             typed │  false │           10.8 │             2.5 │
│           Smorgasbord │   201 │ forwarddiff │             typed │  false │          695.1 │            53.0 │
│           Smorgasbord │   201 │ forwarddiff │ simple_namedtuple │   true │          319.1 │           104.9 │
│           Smorgasbord │   201 │ forwarddiff │           untyped │   true │         1114.3 │            45.0 │
│           Smorgasbord │   201 │ forwarddiff │       simple_dict │   true │        10323.5 │            22.3 │
│           Smorgasbord │   201 │ reversediff │             typed │   true │         1190.0 │            52.4 │
│           Smorgasbord │   201 │    mooncake │             typed │   true │         1263.0 │             3.8 │
│    Loop univariate 1k │  1000 │    mooncake │             typed │   true │         5606.7 │             4.4 │
│       Multivariate 1k │  1000 │    mooncake │             typed │   true │         1236.0 │             8.7 │
│   Loop univariate 10k │ 10000 │    mooncake │             typed │   true │        63260.7 │             4.2 │
│      Multivariate 10k │ 10000 │    mooncake │             typed │   true │        11029.4 │             9.4 │
│               Dynamic │    10 │    mooncake │             typed │   true │          216.4 │             6.4 │
│              Submodel │     1 │    mooncake │             typed │   true │           19.0 │             4.6 │
│                   LDA │    12 │ reversediff │             typed │   true │         1341.4 │             2.0 │
└───────────────────────┴───────┴─────────────┴───────────────────┴────────┴────────────────┴─────────────────┘

Curious to see whether GHA benchmarks come out looking similar.

github-actions · 2025-09-29T17:05:24Z

Benchmark Report for Commit `a011dd6`

Computer Information

Julia Version 1.11.7
Commit f2b3dbda30a (2025-09-08 12:10 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

┌───────────────────────┬───────┬─────────────┬───────────────────┬────────┬────────────────┬─────────────────┐
│                 Model │   Dim │  AD Backend │           VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼────────────────┼─────────────────┤
│ Simple assume observe │     1 │ forwarddiff │             typed │  false │            7.4 │             1.6 │
│           Smorgasbord │   201 │ forwarddiff │             typed │  false │          598.5 │            49.3 │
│           Smorgasbord │   201 │ forwarddiff │ simple_namedtuple │   true │          423.2 │            57.6 │
│           Smorgasbord │   201 │ forwarddiff │           untyped │   true │         1063.4 │            32.1 │
│           Smorgasbord │   201 │ forwarddiff │       simple_dict │   true │         6740.1 │            29.0 │
│           Smorgasbord │   201 │ reversediff │             typed │   true │          914.1 │            46.1 │
│           Smorgasbord │   201 │    mooncake │             typed │   true │          875.0 │             5.8 │
│    Loop univariate 1k │  1000 │    mooncake │             typed │   true │         4455.6 │             5.6 │
│       Multivariate 1k │  1000 │    mooncake │             typed │   true │         1020.9 │             9.3 │
│   Loop univariate 10k │ 10000 │    mooncake │             typed │   true │        51734.7 │             4.9 │
│      Multivariate 10k │ 10000 │    mooncake │             typed │   true │         8677.5 │            10.3 │
│               Dynamic │    10 │    mooncake │             typed │   true │          132.2 │            10.9 │
│              Submodel │     1 │    mooncake │             typed │   true │           10.4 │             5.6 │
│                   LDA │    12 │ reversediff │             typed │   true │          992.7 │             2.1 │
└───────────────────────┴───────┴─────────────┴───────────────────┴────────┴────────────────┴─────────────────┘

codecov · 2025-09-29T17:21:01Z

Codecov Report

❌ Patch coverage is 93.02326% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.51%. Comparing base (08212a2) to head (4f85f2b).

Files with missing lines	Patch %	Lines
src/simple_varinfo.jl	80.00%	3 Missing ⚠️
src/varinfo.jl	95.23%	2 Missing ⚠️
ext/DynamicPPLEnzymeCoreExt.jl	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           breaking    #1060      +/-   ##
============================================
+ Coverage     82.39%   82.51%   +0.11%     
============================================
  Files            42       42              
  Lines          3818     3786      -32     
============================================
- Hits           3146     3124      -22     
+ Misses          672      662      -10

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-09-30T08:56:45Z

DynamicPPL.jl documentation for PR #1060 is available at:
https://TuringLang.github.io/DynamicPPL.jl/previews/PR1060/

mhauru · 2025-09-30T08:58:52Z

CI benchmarks. Target branch:

┌───────────────────────┬───────┬─────────────┬───────────────────┬────────┬────────────────┬─────────────────┐
│                 Model │   Dim │  AD Backend │           VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼────────────────┼─────────────────┤
│ Simple assume observe │     1 │ forwarddiff │             typed │  false │            8.5 │             1.6 │
│           Smorgasbord │   201 │ forwarddiff │             typed │  false │          635.2 │            43.6 │
│           Smorgasbord │   201 │ forwarddiff │ simple_namedtuple │   true │          411.8 │            52.7 │
│           Smorgasbord │   201 │ forwarddiff │           untyped │   true │         1163.6 │            29.7 │
│           Smorgasbord │   201 │ forwarddiff │       simple_dict │   true │         6444.2 │            28.6 │
│           Smorgasbord │   201 │ reversediff │             typed │   true │         1022.9 │            40.9 │
│           Smorgasbord │   201 │    mooncake │             typed │   true │          980.1 │             4.5 │
│    Loop univariate 1k │  1000 │    mooncake │             typed │   true │         5750.3 │             4.3 │
│       Multivariate 1k │  1000 │    mooncake │             typed │   true │          964.6 │             9.1 │
│   Loop univariate 10k │ 10000 │    mooncake │             typed │   true │        64679.1 │             3.9 │
│      Multivariate 10k │ 10000 │    mooncake │             typed │   true │         8179.8 │            10.3 │
│               Dynamic │    10 │    mooncake │             typed │   true │          129.7 │            11.3 │
│              Submodel │     1 │    mooncake │             typed │   true │           12.2 │             5.1 │
│                   LDA │    12 │ reversediff │             typed │   true │         1006.2 │             2.0 │
└───────────────────────┴───────┴─────────────┴───────────────────┴────────┴────────────────┴─────────────────┘

This branch:

┌───────────────────────┬───────┬─────────────┬───────────────────┬────────┬────────────────┬─────────────────┐
│                 Model │   Dim │  AD Backend │           VarInfo │ Linked │ t(eval)/t(ref) │ t(grad)/t(eval) │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼────────────────┼─────────────────┤
│ Simple assume observe │     1 │ forwarddiff │             typed │  false │            7.4 │             1.7 │
│           Smorgasbord │   201 │ forwarddiff │             typed │  false │          597.3 │            49.0 │
│           Smorgasbord │   201 │ forwarddiff │ simple_namedtuple │   true │          422.1 │            57.4 │
│           Smorgasbord │   201 │ forwarddiff │           untyped │   true │          969.2 │            35.2 │
│           Smorgasbord │   201 │ forwarddiff │       simple_dict │   true │         6575.6 │            31.0 │
│           Smorgasbord │   201 │ reversediff │             typed │   true │          883.4 │            47.6 │
│           Smorgasbord │   201 │    mooncake │             typed │   true │          854.6 │             5.1 │
│    Loop univariate 1k │  1000 │    mooncake │             typed │   true │         4305.0 │             5.6 │
│       Multivariate 1k │  1000 │    mooncake │             typed │   true │          991.4 │             9.5 │
│   Loop univariate 10k │ 10000 │    mooncake │             typed │   true │        50138.4 │             5.1 │
│      Multivariate 10k │ 10000 │    mooncake │             typed │   true │         9003.3 │            10.1 │
│               Dynamic │    10 │    mooncake │             typed │   true │          128.2 │            11.4 │
│              Submodel │     1 │    mooncake │             typed │   true │            9.9 │             5.9 │
│                   LDA │    12 │ reversediff │             typed │   true │          989.8 │             2.1 │
└───────────────────────┴───────┴─────────────┴───────────────────┴────────┴────────────────┴─────────────────┘

Roughly in line with what I saw locally. Seems worth it to me, especially if you look at the Loop univariate 1k and 10k models.

yebai · 2025-09-30T11:50:53Z

I suggest we take this chance to rename Metadata.trans to a more readable term, e.g., Metadata.is_unconstrained / Metadata.is_transformed.

mhauru · 2025-09-30T14:17:01Z

Good idea, done.

penelopeysm · 2025-09-30T16:27:12Z

src/varnamedvector.jl

+# TODO(mhauru) Eventually I would like to rename the is_transformed function to
+# is_unconstrained, but that's significantly breaking.
 """
-    istrans(vnv::VarNamedVector, vn::VarName)
+    is_transformed(vnv::VarNamedVector, vn::VarName)


Are you still thinking of this? I personally prefer islinked over istransformed, but isunconstrained / isconstrained I don't like, because it doesn't accurately capture the full lstory.

For example, unlinked variables can still be unconstrained. So is_unconstrained doesn't mean it's unconstrained, it means it's 'guaranteed' to be unconstrained. Also, I suppose linking need not necessarily unconstrain it, it depends on the link function.

But I realise this comment might be a bit out of date

For this PR, I wonder if it is worth standardising. We have islinked(::VarInfo) but istransformed(::VarInfo, ::VarName). Should we change one to the other?

Good point. I kinda punted on the is_unconstrained thing in VarNamedVector because it's invisible to users, but islinked is a good point. Now would be as good a time as any to standardise.

With VarNamedVector, I went with is_unconstrained exactly because having a non-trivial transformation does not guarantee that the variable doesn't remain constrained, and because the flag exists to guarantee unconstrainedness (of user interest) not that some transformation has been applied (not of user interest). The docstring for VarNamedVector says this:

vector of booleans indicating whether a variable has been explicitly transformed to unconstrained Euclidean space, i.e. whether its domain is all of `ℝ^ⁿ`. If `is_unconstrained[varname_to_index[vn]]` is true, it guarantees that the variable `vn` is not constrained. However, the converse does not hold: if `is_unconstrained` is false, the variable `vn` may still happen to be unconstrained, e.g. if its original distribution is itself unconstrained (like a normal distribution).

I was quite pleased with that when I was writing that part of VarNamedVector, but then when I tried to use the same terminology in VarInfo yesterday I wasn't happy with it anymore. Unfortunately I can't now recall why I was unhappy with it... It seems fine to me when I think about it now.

islinked (or is_linked) feels a lot like is_transformed: It says that some link transformation has been applied, not that it's achieved the goal of making this variable unconstrained. Although maybe I misunderstand how people use the term "link" here.

At least right now, I think is_unconstrained is the best description of the flag, but especially if you dislike it, I would go with is_linked, just to match the link and unlink function names, which I wouldn't want to change (and calling them unconstrain and ununconstrain or constrain doesn't work).

I have a slight preference for is_transformed, which is more readable for people unfamiliar with generalised linear models. We could change link/unlink to transform/untransform.

See, e.g., https://www.tamaspapp.eu/TransformVariables.jl/stable/, which also adopts transform for its API.

because the flag exists to guarantee unconstrainedness (of user interest) not that some transformation has been applied (not of user interest)

If this were the case, then is_unconstrained would be correct, but my understanding is that the flag does not reflect that, it just reflects whether a transformation has been applied. The settrans!! function gets called when a transformation is applied, and that just sets the flag to true:

DynamicPPL.jl/src/varnamedvector.jl

Lines 335 to 343 in 3ff4149

"""

settrans!(vnv::VarNamedVector, val::Bool, vn::VarName)

Set the value for whether `vn` is guaranteed to have been transformed so that all of

Euclidean space is its domain.

"""

function settrans!(vnv::VarNamedVector, val::Bool, vn::VarName)

return vnv.is_unconstrained[vnv.varname_to_index[vn]] = val

end

Did I miss a check somewhere to make sure that the transformation is indeed to unconstrained space?

I see, that makes sense. I agree that from the user's perspective it's whether it's unconstrained that matters. With this in mind, I'm not very fussed with any name, as long as its accuracy or lack thereof is clearly documented. In particular, this part of the docstring:

If is_unconstrained[varname_to_index[vn]] is true, it guarantees that the variable vn is not constrained.

should probably be updated, because it is not currently true (even if we would like it to be).

I started writing a response to this, and as the response got longer I came to understand more and more how I don't really understand is_unconstrained/is_transformed and why we need it. (I'll just call it is_unconstrained here.)

Say we have a variable @varname(a), stored in a VarNamedVector as a real value x and a transformation function f, so that the user-facing value of @varname(a) is f(x).

One interpretation of is_unconstrained could just be "is the domain of f all of R^N?" If yes, then x is unconstrained.

But that's not really what we mean with is_unconstrained. What we mean is "is the domain of f all of R^N, and is the image of f equal to the domain of the prior distribution of @varname(a)". So really is_unconstrained is a statement about f, and the relationship between f and some particular model, which defines the prior for @varname(a). That's why link(vi) is not a thing, it has to be link(vi, model).

More confusingly, there's this:

@model function nasty_model() m ~ Exponential() a ~ truncate(Normal(); lower=m) end

Whether is_unconstrained(vi, @varname(a)) should, morally speaking, be true or false, depends on the value of m.

I don't like having such a tie between a VarInfo and a model when one doesn't explicitly reference the other. Nothing else in a VarNamedVector is specific to, or refers to, a particular model. This makes me question why we even need this flag. I'm not yet sure if we do. I've been going through different places where we read the is_unconstrained flag, and the following two I haven't fully understood yet:

When executing with InitContext, whether a variable is linked affects whether we apply a transformation, derived from the prior distribution. See here.

Gibbs wants to make sure that the link-status of a variable is respected. E.g. if one component sampler requires linking and the other doesn't, and both sample the same variable, then we need to link/invlink between executing those two component samplers. See here.

I'm not yet sure how the above two play out in the context of linking status depending on the model, and especially on values of other variables like in nasty_model. Once I hopefully, eventually, understand that, I should be able to understand whether we really need this flag, and if so, what it's name should be.

I recognise that there is a larger discussion to be had about dynamic-support-models, but personally, I think getting correct behaviour for that case is a stretch goal.

My comments are solely focused on the difference between 'unconstrained' and 'transformed' for an ordinary, perfectly static, model. As far as I can see, right now, transformed does not imply unconstrained, and unconstrained does not imply transformed. The flag really keeps track of transformed. So we should either not call it unconstrained, or at least admit in the docstring that it is not necessarily true. The flexibility of DynamicPPL makes it very easy to get into a rabbit hole with all sorts of edge cases (we did the same with dynamic-sizes and particle MCMC), but I think we should get the semantics correct for the foundations first.

The reason why I like islinked is that we define (see Bijectors.jl) a "link" to be a 'special' kind of transformation: one which maps from the support of the prior distribution (which may itself be unconstrained) to unconstrained space. At least for static-support models, that immediately resolves any ambiguity, because if islinked = true, that immediately implies unconstrained. However, if islinked = false, it does not imply that it is constrained; unlike is_unconstrained = false which, on its face, suggests that the variable is constrained.

This does not handle the dynamic-support case, of course! But nothing so far does, and I would like to say that this is the best of the options that we actually have.

This further implies that we want to forbid transformations that do not transform to unconstrained space. In other words, StaticTransformation. I am not sure who uses StaticTransformation -- maybe worth a check? And if we don't forbid these, then the flag has to be istransformed, because that's exactly what it represents.

The flag really keeps track of transformed.

Welllll, not really. It keeps track of whether the transformation applied arises from a prior distribution associated with this variable. You can apply all sorts of transformations. For instance, any matrix-valued variable will have a flattening transformation applied to it, so that in my above example f is a call to reshape. Whether that warrants raising is_unconstrained/is_transformed to true depends on whether the prior distribution has as its domain all matrices of that shape, or some particular subset (e.g. symmetric ones).

I'm not arguing that we should call it is_unconstrained. I'm arguing that this is hairy, and I don't quite know what we should call it. There's a risk that thinking about dynamic models makes this overly complicated, but I think there's also a chance that it forces us to understand what this flag truly is about, by rapidly proving wrong hypotheses wrong.

I also still think there's a chance we don't even need this flag, which would be the Gordian knot solution.

EDIT: Wrote this without seeing your latest message above.

penelopeysm · 2025-09-30T16:29:34Z

src/simple_varinfo.jl

-islinked(vi::SimpleVarInfo) = istrans(vi)
+islinked(vi::SimpleVarInfo) = is_transformed(vi)


like this line is just the same function but duplicated. so it feels like to me we could just pick one and roll with it!

Replace Medata.flags with Metadata.trans

756cc25

github-actions bot assigned mhauru Sep 29, 2025

Fix a bug

1091986

mhauru added 2 commits September 30, 2025 12:21

Fix a typo

ece8fb5

Fix two bugs

a011dd6

Rename trans to is_transformed

0f8c9b1

Merge remote-tracking branch 'origin/breaking' into mhauru/delete-flags

4f85f2b

mhauru requested a review from penelopeysm September 30, 2025 16:19

penelopeysm reviewed Sep 30, 2025

View reviewed changes

	"""
	settrans!(vnv::VarNamedVector, val::Bool, vn::VarName)

	Set the value for whether `vn` is guaranteed to have been transformed so that all of
	Euclidean space is its domain.
	"""
	function settrans!(vnv::VarNamedVector, val::Bool, vn::VarName)
	return vnv.is_unconstrained[vnv.varname_to_index[vn]] = val
	end

		islinked(vi::SimpleVarInfo) = istrans(vi)
		islinked(vi::SimpleVarInfo) = is_transformed(vi)

Replace Metadata.flags with Metadata.trans #1060

Are you sure you want to change the base?

Replace Metadata.flags with Metadata.trans #1060

Uh oh!

Conversation

mhauru commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before

After

Uh oh!

github-actions bot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Report for Commit a011dd6

Computer Information

Benchmark Results

Uh oh!

codecov bot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Sep 30, 2025

Uh oh!

mhauru commented Sep 30, 2025

Uh oh!

yebai commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhauru commented Sep 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

penelopeysm Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhauru Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

penelopeysm Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Replace `Metadata.flags` with `Metadata.trans` #1060

Replace `Metadata.flags` with `Metadata.trans` #1060

mhauru commented Sep 29, 2025 •

edited

Loading

github-actions bot commented Sep 29, 2025 •

edited

Loading

Benchmark Report for Commit `a011dd6`

codecov bot commented Sep 29, 2025 •

edited

Loading

yebai commented Sep 30, 2025 •

edited

Loading

penelopeysm Oct 7, 2025 •

edited

Loading

mhauru Oct 7, 2025 •

edited

Loading

penelopeysm Sep 30, 2025 •

edited

Loading