Convert activation functions to numpower#381
Conversation
Sam 10 selu and sigmoid
Sam 11 si lu
Sam 12 softmax and softplus functions
Sam 13 softsign
andrewdalpino
left a comment
There was a problem hiding this comment.
Very nice work @apphp and @SkibidiProduction ... I think this is exactly what we need for the first round of integration with NumPower. I had a few questions and comments that may change the outcome of the PR so I'm just going to leave it at that for now until we get that sorted.
Overall, fantastic usage of unit tests and good code quality. I love to see it.
Andrew
| | 1 | alpha | 1.0 | float | The value at which leakage will begin to saturate. Ex. alpha = 1.0 means that the output will never be less than -1.0 when inactivated. | | ||
|
|
||
| ## Size and Performance | ||
| ELU is a simple function and is well-suited for deployment on resource-constrained devices or when working with large neural networks. |
There was a problem hiding this comment.
How did you come up with these size and performance details? I'm noticing that some differ from my understanding. For example, it is not necessarily true when taken in the context of all activation functions that ELU is a simple function or well-suited for resource constrained devices.
Perhaps it would actually be more confusing to offer this somewhat subjective explanation. In addition, in practice, activation functions have very little impact on the total runtime of the network - so taking the effort here to detail out their performance is somewhat distracting.
How do you feel about dropping this "size and performance" section all together, not being opinionated about individual activation functions, and instead letting the user discover the nuances of each activation function for themselves? However, if there is something truly outstanding about a particular activation functions performance characteristics, then let's make sure to include that in the description of the class. For example, ReLU is outstanding because it is the simplest activation function in the group. Maybe there's another activation function that has an associated kernel that is particularly optimized, etc.
There was a problem hiding this comment.
Yes remove the section, but if there is something unique about a particular functions performance characteristics, we can put that info in the description. What do you think?
| */ | ||
| public function activate(NDArray $input) : NDArray | ||
| { | ||
| // Calculate |x| |
There was a problem hiding this comment.
I don't feel that these comments provide enough value to justify their existence. I can understand what is going on clearly given your great usage of variables and naming.
| | 1 | alpha | 1.0 | float | The value at which leakage will begin to saturate. Ex. alpha = 1.0 means that the output will never be less than -1.0 when inactivated. | | ||
|
|
||
| ## Size and Performance | ||
| ELU is a simple function and is well-suited for deployment on resource-constrained devices or when working with large neural networks. |
andrewdalpino
left a comment
There was a problem hiding this comment.
Looks good fellas, let's roll!
Activation implementations
Swapped out custom Tensor code for NumPower APIs across all functions: ReLU, LeakyReLU, ELU, GELU, HardSigmoid, SiLU, Tanh, Sigmoid, Softmax, Softplus, Softsign, ThresholdedReLU, etc.
Updated derivative methods to use numpower’s derivative helpers.
Tests
Refactored unit tests to assert against numpower outputs.
Adjusted tolerances and assertions to match numpower’s numeric behavior.
Documentation
Added/updated images under docs/images/activation-functions/ to illustrate each activation curve and its derivative using the new implementations.
Cleaned up corresponding markdown to reference the updated diagrams.
Code cleanup
Aligned naming conventions and method signatures with numpower’s API.
Minor style fixes (whitespace, imports, visibility).