Problem with uninformative observations inflating posterior #607

meditans · 2026-03-14T20:50:23Z

meditans
Mar 14, 2026

Hi all,
I'm running into a curious modeling problem in which a posterior becomes more and more certain in presence of uninformative observations, and I wanted to open a discussion to know how you guys would solve it.

As in all my latest examples, I have in front of me a series of boxes. They can have one of two items inside, and can have a lid or not. I observe either the content inside or the lid (3 possible observations). I start thinking that the boxes have a shared prior of Dirichlet([1, 1]) (I know that the boxes are all the same in distribution). Then I observe 100 closed lids. Then the posterior on the boxes will be Dirichlet[51, 51]! I have become more certain that the boxes are uniformly distributed even when I couldn't see anything!!!

The code:

using RxInfer, Random

tensor_observation = [1.0 0; 0 1; 0 0 ;;;
                      0.0 0; 0 0; 1 1]

# 1. define the model
@model function learn_prior(y)
    p ~ Dirichlet([1.0, 1.0])

    for i in eachindex(y)
        a[i] ~ Categorical(p)
        l[i] ~ Categorical([0.5, 0.5])
        y[i] ~ DiscreteTransition(a[i], tensor_observation, l[i])
    end
end

# 2. break the dependency between the prior and the latent states for vmp
my_constraints = @constraints begin
    q(p, a) = q(p)q(a)
end

my_init = @initialization begin
    q(p) = Dirichlet([1.0, 1.0])
end

# 3. create a dataset using one-hot vectors instead of integers
# 15 observations of state 1, 85 observations of state 2
state_1 = [0.0, 0.0, 1.0]
dataset = [ fill(state_1, 30); fill(missing, 70) ]
shuffle!(dataset)

# 4. run inference
result = infer(
    model          = learn_prior(),
    data           = (y = dataset,),
    constraints    = my_constraints,
    initialization = my_init,
    iterations     = 10
)

# 5. check the learned posterior
learned_p = result.posteriors[:p][end]
println("learned posterior: ", learned_p)

# learned posterior: Dirichlet{Float64, Vector{Float64}, Float64}(alpha=[51.0, 51.0])

I understand now that this is a side effect of me using the q(p, a) = q(p)q(a) constraint. I determined that in this particular case, since the messages that can arrive are one of:

Categorical{Float64, Vector{Float64}}(support=Base.OneTo(2), p=[0.9999999999980002, 1.999999999994002e-12])
Categorical{Float64, Vector{Float64}}(support=Base.OneTo(2), p=[1.999999999994002e-12, 0.9999999999980002])
Categorical{Float64, Vector{Float64}}(support=Base.OneTo(2), p=[0.5, 0.5])

I am justified in writing the rule:

@rule Categorical(:p, Marginalisation) (m_out::Categorical, ) = begin
    @assert size(m_out.p) == (2,)
    first = m_out.p[1]
    if     isapprox(first, 1.0, atol=1e-5) return Dirichlet([2.0, 1.0])
    elseif isapprox(first, 0.0, atol=1e-5) return Dirichlet([1.0, 2.0])
    else                                   return Dirichlet([1.0, 1.0])
    end
end

This, plus dropping the now useless meanfield assumption, will update the prior sensibly in my case. But what is a sensible general choice for this rule, that can accept all values for the categorical distribution and not only the three values I have here?

meditans · 2026-03-16T08:39:22Z

meditans
Mar 16, 2026
Author

Here's a general function I like: as you see from my rule above, I would like the total contribution to the prior to be 1 when the observation is absolutely certain, and 0 when the observation is maximally uncertain. In order to extend this kind of idea to Categorical distributions with more than 2 possibilities, i added a weight to the contributions to the prior that is scaled by entropy:

uniform_categorical(n) = Categorical(fill(1/n, n))

@rule Categorical(:p, Marginalisation) (m_out::Categorical, meta::EntropyWeighted,) = begin
    vec_out = probvec(m_out)
    dim_out = length(vec_out)
    w       = 1 - entropy(m_out) / entropy(uniform_categorical(dim_out))
    return Dirichlet(1 .+ w .* vec_out)
end

These are not the messages prescribed by the Bethe Free Energy minimization framework, but I wonder if they can work as an approximation in the same way a meanfield constraint around this node would. I would be interested in understanding if adding this sort of rules can introduce problems I didn't anticipate, and if it could be justified theoretically.

1 reply

skoghoern Mar 19, 2026

my only idea remains to change the model and use the mixture node to mimic this if-else case. its not perfectly what you want since the gating also sends small amount of information to "open-lid".
besides that you would also need to implement the log-scale addon for the mixture node, which i unfortunately dont really have experience with.
conceptually i would try something like:

@model function learn_prior_gated(y)
    p ~ Dirichlet([1.0, 1.0])
    for i in eachindex(y)
        content[i] ~ Categorical(p) 

        open_lid[i] ~ DiscreteTransition(content[i], [1.0 0; 0 1; 0 0])
        closed_lid[i] ~ Categorical([0.0, 0.0, 1.0])

        lid_status[i] ~ Categorical([0.5, 0.5]) # switch (1=open, 2=closed)

        y[i] ~ Mixture(switch = lid_status[i], inputs = [open_lid[i], closed_lid[i]])
    end 
end

@_all maybe someone else has a better idea?

apashea · 2026-03-20T11:52:40Z

apashea
Mar 20, 2026

I'm posting this as part of a conversation with @meditans as we discussed this at RxInfer meetings:

"""
I was reflecting on the "uninformative observations still leading to learning / updating Dirichlet when it shouldn't occur" discussion https://github.com/orgs/ReactiveBayes/discussions/607 and just thinking, maybe having a clear solution for this issue would be helpful for real-time use-cases, e.g., a robotics case entailing discrete observations where at time t a sensor malfunction occurs and we want the model to avoid updating based on this sensor malfunction? Even then, in that case it seems the typical case would be to still treat the observation as "informative" in a general sense -- ideally the model itself would have the architecture to infer over time when a bad sensor reading is occurring, rather than relying on any additional rules for indicating "this observation was from a malfunction and therefore we should skip learning/updating." But yes, there I see the problem of writing a new gen model without adding more complexity (e.g., you could--perhaps crudely--add an additional hidden state factor [not_malfunctioning, malfunctioning], which adds to complexity + presumes malfunctory observations will have a distinct pattern distinguishable from "good" observations, which might not be the case such as a malfunction where the sensor "gets stuck" / lagged and merely sends the previous observation again).
"""

To which @meditans replied, and requested I post the discussion here to avoid context loss:
"""
As I was saying in the call, I agree that writing a better model is the proper solution, but a better model means that when we determine an uninformative case, we are able to sever the probabilistical connection between the state and its priors. And the switch in this case, what you call non_malfunctioning, malfunctioning, is also a state variable that is inferred. The model I had in mind has state_informative ~ Categorical(prior), state_uninformative ~ Categorical([0.5, 0.5]), and then there's a switch between the two based on lid, and you connect the observation to this combination. I seem to recall I tried writing this and getting stuck, but in the light of the discussion it is worth it if I try again."""

0 replies

meditans · 2026-03-20T12:31:34Z

meditans
Mar 20, 2026
Author

Thank you both @skoghoern and @apashea. Here's the model @skoghoern is suggesting:

@model function model_00(y)
    p ~ Dirichlet([1.0, 1.0])

    local content, obs_open_lid, lid_status
    for i in eachindex(y)
        content[i] ~ Categorical(p)

        obs_open_lid[i] ~ DiscreteTransition(content[i], [1.0 0; 0 1; 0 0])
        obs_open_lid[i] ~ Categorical([0.5, 0.5, 0])

        lid_status[i] ~ Categorical([0.5, 0.5])

        y[i] ~ Mixture(switch = lid_status[i],
                       inputs = [obs_open_lid[i], [0.0, 0.0, 1.0]])
    end
end

yep  = [1.0, 0, 0]
nope = [0.0, 1, 0]
lid  = [0.0, 0, 1]

result = infer(
    model  = model_00(),
    data   = (y = [lid, lid, lid],),
    addons = (AddonLogScale(),)
)

and this is the one I was gesturing at in our discussion @apashea:

@model function model_01(y)
    p ~ Dirichlet([1.0, 1.0])

    local lid_status
    for i in eachindex(y)
        state_informative[i] ~ Categorical(p)
        state_uninformative[i] ~ Categorical([0.5, 0.5])

        lid_status[i] ~ Categorical([0.5, 0.5])

        y[i] ~ Mixture(switch = lid_status[i],
                       inputs = [state_informative[i], state_uninformative[i]])
    end
end

result = infer(
    model  = model_00(),
    data   = (y = [lid, lid, lid],),
    addons = (AddonLogScale(),)
)

In both models, that are probably the same underlying idea, I don't know how to solve the errors that arise from the missing AddonLogScale. Maybe @bvdmitri has an idea here?

0 replies

ReactiveBayes

Problem with uninformative observations inflating posterior #607

Uh oh!

Uh oh!

meditans Mar 14, 2026

Replies: 3 comments · 1 reply

Uh oh!

Uh oh!

meditans Mar 16, 2026 Author

Uh oh!

Uh oh!

skoghoern Mar 19, 2026

Uh oh!

Uh oh!

apashea Mar 20, 2026

Uh oh!

Uh oh!

meditans Mar 20, 2026 Author

meditans
Mar 14, 2026

Replies: 3 comments 1 reply

meditans
Mar 16, 2026
Author

apashea
Mar 20, 2026

meditans
Mar 20, 2026
Author