How does the hidden layer activation function ReLU effectively add non-linearity to a model? #569
-
Hi! I'm approximately halfway through the "PyTorch for Deep Learning & Machine Learning - Full Course". To be preciseAt that moment, by timestampthe goal is 12:11:17. At that moment, we effectivelyto create a non-linear model capable of classifying whether a dot is red, or blue based on its 2-dimensional position (x, y). We utilize the activation function ReLU in between each layer to accomplish this. I've done some independent research on the most popular activation function (sigmoid, softmax, threshold, etc) and they all make sense. Furthermore, Mr. Bourke mentioned how ReLU is a non-linear function due to the fluctuating derivative of the function (from effectively 0 for negative values to 1 for positive values). However, I don't understand how ReLU effectively gives a model the capability to learn far more complex, non-linear problems, as isn't all it really does, from a practical perspective, is convert negative values to 0? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Backpropagation also known as loss.backward() :I guees you already understand the basics, lienar x = y and no linear (1 if x > 0 else 0)
|
Beta Was this translation helpful? Give feedback.
Backpropagation also known as loss.backward() :
I guees you already understand the basics, lienar x = y and no linear (1 if x > 0 else 0)
Linear: the derivate of lineal is a constant, 1 for all values. This mean that gradients will propagate with no changes and will not introduce the no lineal changes in the backpropagate step, this limit the network or model to catch no lineal relations in the data and is harder for the model to learn
ReLU: The derivate is 1 for x > 0 else 0, as you see there is the first difference from this point, lets continue, this means that gradients will propagate without changes for positive values and for negative values gradient will be delete or null
So,…