How does the hidden layer activation function ReLU effectively add non-linearity to a model? #569

ChampPhil · 2023-07-25T20:58:27Z

ChampPhil
Jul 25, 2023

Hi! I'm approximately halfway through the "PyTorch for Deep Learning & Machine Learning - Full Course". To be preciseAt that moment, by timestampthe goal is 12:11:17. At that moment, we effectivelyto create a non-linear model capable of classifying whether a dot is red, or blue based on its 2-dimensional position (x, y). We utilize the activation function ReLU in between each layer to accomplish this.

I've done some independent research on the most popular activation function (sigmoid, softmax, threshold, etc) and they all make sense. Furthermore, Mr. Bourke mentioned how ReLU is a non-linear function due to the fluctuating derivative of the function (from effectively 0 for negative values to 1 for positive values).

However, I don't understand how ReLU effectively gives a model the capability to learn far more complex, non-linear problems, as isn't all it really does, from a practical perspective, is convert negative values to 0?

Answered by jhoanmartinez

Aug 9, 2023

Backpropagation also known as loss.backward() :

I guees you already understand the basics, lienar x = y and no linear (1 if x > 0 else 0)

Linear: the derivate of lineal is a constant, 1 for all values. This mean that gradients will propagate with no changes and will not introduce the no lineal changes in the backpropagate step, this limit the network or model to catch no lineal relations in the data and is harder for the model to learn
ReLU: The derivate is 1 for x > 0 else 0, as you see there is the first difference from this point, lets continue, this means that gradients will propagate without changes for positive values and for negative values gradient will be delete or null
So,…

View full answer

jhoanmartinez · 2023-08-09T15:32:11Z

jhoanmartinez
Aug 9, 2023

Backpropagation also known as loss.backward() :

I guees you already understand the basics, lienar x = y and no linear (1 if x > 0 else 0)

Linear: the derivate of lineal is a constant, 1 for all values. This mean that gradients will propagate with no changes and will not introduce the no lineal changes in the backpropagate step, this limit the network or model to catch no lineal relations in the data and is harder for the model to learn
ReLU: The derivate is 1 for x > 0 else 0, as you see there is the first difference from this point, lets continue, this means that gradients will propagate without changes for positive values and for negative values gradient will be delete or null
So, it is the derivate of the ReLU function in the backward or backpropagate
Let me know if it make sense fella

2 replies

ChampPhil Aug 9, 2023
Author

Thank you @jhoanmartinez! I think I understand what ReLU does now (results in a non-linear output for a neuron, which results in non-linear/more complex gradients, which results in the weights/bias changing non-linearly?)

jhoanmartinez Aug 9, 2023

Yes @ChampPhil thats right.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How does the hidden layer activation function ReLU effectively add non-linearity to a model? #569

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How does the hidden layer activation function ReLU effectively add non-linearity to a model? #569

Uh oh!

Uh oh!

ChampPhil Jul 25, 2023

Backpropagation also known as loss.backward() :

Replies: 1 comment · 2 replies

Uh oh!

jhoanmartinez Aug 9, 2023

Backpropagation also known as loss.backward() :

Uh oh!

ChampPhil Aug 9, 2023 Author

Uh oh!

jhoanmartinez Aug 9, 2023

ChampPhil
Jul 25, 2023

Replies: 1 comment 2 replies

jhoanmartinez
Aug 9, 2023

ChampPhil Aug 9, 2023
Author