Skip to content

Conversation

@DominiqueMakowski
Copy link
Contributor

Following up the issues related to an exp link-function (TuringLang/Turing.jl#2310), it reinforced the idea that a softplus link could actually be a good alternative. However, I feel like implementing its generalized version (#83) would be key (useful when modelling small parameters), so here my shot at it.

image

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how widely use this variant is (and whether there are other commonly used alternatives, the issue mentions also Liu and Ferber 2016?). If it's added, we should make to sure to test it and to also add support for it in the ChainRules, InverseFunctions, and ChangesOfVariables extensions.

src/basicfuns.jl Outdated
This is also called the ["softplus"](https://en.wikipedia.org/wiki/Rectifier_(neural_networks))
transformation, being a smooth approximation to `max(0,x)`. Its inverse is [`logexpm1`](@ref).
The generalized `softplus` function (Wiemann et al., 2024) takes an additional optional parameter `a` that control
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there exist earlier references for this function?

Copy link
Contributor Author

@DominiqueMakowski DominiqueMakowski Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through Liu and Farber to double-check

From my understanding (ML is not my field), they validate "noisy softplus" as an improvement over other activation functions for neurons in NNs.

image

However, it seems like they named the a parameter sigma. Their plot looks similar but different in terms of values (?)

image

I'm not sure how widely use this variant

I share your concern here, I'm also careful not to add niche features to such a base package and add maintaining burden.
I can't say how commonly the generalized version is already used, its development seems fairly recent.
However, I can see its usefulness in quite a lot of cases: the default softplus only becomes close to identity after x > 2, and from experience we often do model parameters smaller than that (typical sigmas in neuroscience/psychology are like between 0 and 1), so using adjusted softplus links would make sense in these contexts. I suppose it's a tradeoff between the complexity of the feature and its (potential) usage

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the remaining items here are:

  • Include the new docstrings in the documentation
  • Add tests for softplus and invsoftplus
  • Add support for InverseFunctions for softplus and invsoftplus and test it
  • Add support for ChangesOfVariables for softplus and invsoftplus and test it

I think ChainRules support should not be needed since log1pexp and log1mexp are already supported, and we can expect AD to differentiate through the remaining parts of the functions.

@DominiqueMakowski
Copy link
Contributor Author

Add support for InverseFunctions / ChangesOfVariables

Can you clarify?

@DominiqueMakowski
Copy link
Contributor Author

Kind bump

@devmotion
Copy link
Member

Sorry, I missed your previous comment.

Add support for InverseFunctions / ChangesOfVariables

Since this PR adds new functions, we should also add definitions of InverseFunctions.inverse to https://github.com/JuliaStats/LogExpFunctions.jl/blob/289114f535827c612ce10c01b8dec9d3a55e4d15/ext/LogExpFunctionsInverseFunctionsExt.jl and definitions of ChangesOfVariables.with_logabsdet_jacobian to https://github.com/JuliaStats/LogExpFunctions.jl/blob/289114f535827c612ce10c01b8dec9d3a55e4d15/ext/LogExpFunctionsChangesOfVariablesExt.jl. Additionally, we could add definitions of ChainRulesCore.frule and ChainRulesCore.rrule - but in principle AD should "just work" since all other involved functions are known to ChainRules.

@DominiqueMakowski
Copy link
Contributor Author

I am not sure how to specify the ChangesofVariables one 🤔

@DominiqueMakowski
Copy link
Contributor Author

Kind bump

@DominiqueMakowski
Copy link
Contributor Author

Since there is no preexisting ChangesOfVariables.with_logabsdet_jacobian for softplus I'm really not sure what to write there

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had re-reviewed the PR but apparently forgotten to submit the review on GH.

Copy link
Collaborator

@tpapp tpapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tpapp tpapp merged commit 76a23a7 into JuliaStats:master Dec 11, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants