Add logprob support for leaky-ReLU switch transforms #7995

eclipse1605 · 2025-12-15T07:48:12Z

Description

added log-probability support for leaky-ReLU graphs constructed as

y = switch(x > 0, x, a * x)

where x is a single continuous measurable variable.

notes

only supports a single continuous measurable variable.
the slope a must be non-measurable and strictly positive.
behavior at y == 0 follows the y <= 0 branch (measure-zero set).

Related Issue

Closes #7543

Checklist

Checked that the pre-commit linting/style checks pass
Included tests that prove the fix is effective or that the new feature works
Added necessary documentation (docstrings and/or example notebooks)
If you are a pro: each commit corresponds to a relevant logical change

Type of change

ricardoV94 · 2025-12-15T08:21:18Z

The more general requirement which we could try to see of we can cover is a switch where the two branches don't overlap, so when you invert you know for sure which branch it came from.

eclipse1605 · 2025-12-15T09:26:23Z

that makes sense. correct me if im wrong, but the key property we need is that the two branches map to non overlapping regions in y, so that when we observe a y we can tell which branch it came from and apply the correct inverse + Jacobian.

if so, we can extend the current pattern matcher to y = switch(x > k, f1(x), f2(x)). then we can extract (m1, b1) and (m2, b2) for the two branches. check that each branch is monotone (m != 0). check the images don’t overlap once you restrict to the domains x ? k. if the check passes, build the inverse per branch x = (y - b) / m and add the Jacobian term -log|m|, using a switch on y with the boundary induced by k.

eclipse1605 · 2025-12-15T15:31:35Z

@ricardoV94 if my understanding is right would you prefer that i extend this PR with that generalization, or should I open a follow-up PR to keep the changes scoped?

ricardoV94 · 2025-12-15T17:33:27Z

that makes sense. correct me if im wrong, but the key property we need is that the two branches map to non overlapping regions in y, so that when we observe a y we can tell which branch it came from and apply the correct inverse + Jacobian.

if so, we can extend the current pattern matcher to y = switch(x > k, f1(x), f2(x)). then we can extract (m1, b1) and (m2, b2) for the two branches. check that each branch is monotone (m != 0). check the images don’t overlap once you restrict to the domains x ? k. if the check passes, build the inverse per branch x = (y - b) / m and add the Jacobian term -log|m|, using a switch on y with the boundary induced by k.

Yeah exactly! But you don't need to implement the logic to invert the branches, or the jacobian. If both branches of the switch are measurable, it means PyMC figured out the logp and you can just evaluate it at the final value (that will take care of those details). The question posed by the switch is which of them you need?

Logp for this switch might look something like (pseudocode):

def conditional_switch_logp(value, true_branch_rv, false_branch_rv):
  value_implies_true_branch = f(value)
  # Note the logp is evaluated at the value, the switch just gates which one is selected
  logp = switch(value_implies_true_branch, logp(true_branch1_rv, value), logp(true_branch2_rv, value))
  return logp

I think (need to confirm) that the way things are setup, the rewrite that marks the switch as being measurable only has to worry about whether we meet the constraints you mentioned.

Current code can already figure out the logp(Normal, value), or logp(Normal * a, value), for you.

The strategy may look something like this sequence of checks:

You have a switch
It's not yet Measurable
Both true and false branches are measurable (that means PyMC already known how to get their logp)
We have a simple condition expression, x > k or something along those lines. This is where you find what x even is.
Both branches are related to x
5.1 If none is connected, this is already handled by the switch mixture machinery switch(cond, x, y), do nothing
5.2 If only one is connected, and the other is a constant, it's actually a censored process something like switch(x > 0, k, x). But this will never be the case given requirement 3.
5.3 if only one is connected, and the other is a measurable variable it's also fine, but the checks for invertibility may be trickier? Didn't want to think about this right now. If it seems simple to you let me know.
The setup passes the constraints for invertibility (from what we can infer)

Examples:
y = switch(x > 0, x, x * a), a > 0 -> true_branch = y > 0
y = switch(x > 0, x * a, x * b), a,b > 0 -> true_branch = y > 0
y = switch(x > 0, x * a, x * b), a, b < 0 -> true_branch = y < 0

But also:
y = switch(x > 1, x ** 3, x) -> true_branch = y > 1
y = switch(x > 0, exp(x) - 1, x) -> true_branch = y > 0
y = switch(x > -1, x, exp(x+1) - 2) -> true_branch = y > -1
y = switch(x > 1, exp(x - 1), log(x) + 1) -> true_branch = y > 1

(y is what becomes value in the logp function)

Restricting to the original monotonically increasing leaky RELU case is fine, but I would like to structure the code so it's ready to extend to more cases in the future.

If once you figure it out, you want to extend that's awesome and welcome but not a blocker.

How does that sound?

eclipse1605 · 2025-12-16T08:14:33Z

@ricardoV94 thanks, that makes sense, i refactored the implementation to follow that approach.

rewrote find_measurable_leaky_relu_switch so it now only tags the switch as measurable when both branches are already measurable, so we can delegate all inversion/Jacobian details to existing logprob rules for each branch.
the _logprob for MeasurableLeakyReLUSwitch now just gates between branch logps evaluated at the observed value:
switch(value > 0, _logprob_helper(x, value), _logprob_helper(neg_branch, value))
kept the runtime CheckParameterValue("leaky_relu slope > 0") guard to ensure the “value implies branch” predicate is valid, and attached it to the returned expression so it can’t get optimized away.

this is currently scoped to the leaky ReLU pattern, but the structure is such that extending to other non overlapping switch patterns should be straightforward (separate predicate + constraints logic). if approved ill go ahead trying to implement the general “non-overlapping images” framework.

eclipse1605 · 2025-12-31T07:04:45Z

@ricardoV94 should i make any more changes in this PR?

ricardoV94

Sorry for the delay.

This is looking great, but I do have some comments / questions.

Let me know if my comments are off.

pymc/logprob/mixture.py

eclipse1605 · 2026-01-09T17:30:37Z

how about this @ricardoV94?

ricardoV94

This is pretty great. I only have minor requests/questions left

pymc/logprob/switch.py

ricardoV94 · 2026-01-12T11:41:48Z

pymc/logprob/switch.py

+measurable_switch_non_overlapping = MeasurableSwitchNonOverlapping(scalar_switch)
+
+
+def _is_x_threshold_condition(cond: TensorVariable, x: TensorVariable) -> bool:


better name for this helper to emphasize zero? perhaps _is_zero_x_threshold

ricardoV94 · 2026-01-12T11:48:12Z

pymc/logprob/switch.py

+
+@node_rewriter([tensor_switch])
+def find_measurable_switch_non_overlapping(fgraph, node):
+    """Detect `switch(x > 0, x, a * x)` and replace it by a measurable op."""


Can you open an issue for follow-up extensions? From the top of my mind:

We should try and support the equivalent switch written the other way around switch(x <= 0, a * x, x). The user shouldn't have to guess the specific format that we support

Allow scaling factors on both branches, still with the constraint that they must have the same sign (or if we want to remain more restrictive for now, that they are both non-negative). Just because it isn't too much harder than what we have now.

The more general cases discussed above with monotonic functions of x that don't overlap an the two branches

sure, that makes sense

We should try and support the equivalent switch written the other way around switch(x <= 0, a * x, x). The user shouldn't have to guess the specific format that we support

I have raised #8049 for this, please check it and let me know if I should raise the rest of the issues as well, or if you’d prefer a different structure/scope.

I was thinking to raise issues for the following extensions:

support equivalent switch spellings (e.g. switch(x <= 0, a*x, x) / swapped comparisons) so users don’t need to match a single canonical form.

support scaling on both branches (switch(x > 0, a_pos*x, a_neg*x)) with constraints that guarantee non-overlapping images (start with both scales strictly positive or same-sign + non-zero).

track a broader "non-overlapping switch transform" framework for monotone branch functions of x where the observed value implies the branch, while still gating between existing branch logps.

support non-zero thresholds (x ? k) once the above is stable, since determining the value -> branch predicate becomes more subtle.

Yeah that's a solid plan. Feel free to describe all that in the issue or open separate ones

Yeah that's a solid plan

ricardoV94 · 2026-01-12T11:53:46Z

pymc/logprob/switch.py

+
+    # For `a > 0`, `switch(x > 0, x, a * x)` maps to disjoint regions in `value`:
+    # true branch -> value > 0, false branch -> value <= 0.
+    value_implies_true_branch = pt.gt(value, 0)


Mathematically it should be the same with a continuous function, but the logp may be more stable? If we want to be pedantic we could check if the original cond had the exact zero in the true or neg branch.

Suggested change

value_implies_true_branch = pt.gt(value, 0)

value_implies_true_branch = pt.ge(value, 0)

ricardoV94 · 2026-01-12T11:58:27Z