- Understand how the process of gradient descent when altering both y-intercept and slope variables
- Understand what it means to take a partial derivative
- Understand the rule for taking partial derivatives
In the last section, we talked about how we to think about moving along a 3-d cost curve.
We know that moving along the 3-d cost curve above, means changing the
In this lesson, we'll learn about gradient descent in three dimensions, but let's first remember how it worked in two dimensions with just changing one variable of our regression line.
In two dimensions, when changing just one variable,
So that was gradient descent in two dimensions. What is gradient descent in three dimensions?
In three dimensions, we once again choose an initial regression line, which means that we are choosing a point on the graph below. Then we begin taking steps towards the minimum. But of course, we are now able to walk not just forwards and backwards but left and right as well -- as we now can alter two variables.
To get a sense of how this works, imagine our initial regression line places us at the back-left corner of the graph above, with a slope of 50, and y-intercept of negative 20. Now imagine that we cannot see the rest of the graph - yet we still want to approach the minimum. How do we do this?
Once again, we feel out the slope of the graph with our feet. Onluy this time, as we shift our feet, we are preparing to walk in two dimensional space.
So this is our approach. We shift horizontally a little bit to determine the change in output in right-left direction, and then shift forward and back to determine the change in output in that direction. From there we take the next step in the direction of the steepest descent.
So now, perhaps, you can get a sense of why our technique of gradient descent is so powerful. Once we consider that in moving towards our best fit lines, we have a choice of moving anywhere in a two-dimensional space, then using the slope to guide us only becomes more important.
So how does this approach of shifting back and forth translate mathematically? It means we determine the slope in one dimension, then the other. Then, we move where that that slope is steepest downwards. This moves us towards our minimum.
To measure the slope in each dimension, one after the other, we'll take the derivative with respect to one variable, and then take the derivative with respect to another variable. Now let's be very explicit about what it means to take the partial derivative with respect to a variable.
Let's again talk about this procedure in general, and then we'll apply it to the cost curve. So let's revisit our multivariable function:
Remember that the function looks like the following:
To take a derivative with respect to
And to express the change in output with respect to
So what does a derivative
Well remember how we think of a standard derivative of a one variable function, for example
So in two dimensions, to take the derivative at a given point, we simply calculate the slope of the function at that x value.
Now the partial derivative of a multivariable function is fairly similar. But here it's equal to the slope of the tangent line at a specific
Let's take a close look. The top left graph shows
So with taking the partial derivative
As you can see,
This can be a little mind-bending so let's go through this again for
Now for
First let's understand our plots -- they may be surprising. Starting at the top left quadrant the graph of the function
So now, to think about taking the derivative, once again we move to a slice of graph for a value of
So that is our technique for a partial derivative. For $\frac{df}{dy} $ we move to a slice of the curve at a specific value of
For
Ok, so now that you understand the slide, slide, nudge, maybe you can understand this little shortcut that we can pull. For any multivariable function, the variables that you are not taking the derivative with respect to, can just be treated as a constant.
For example, with our function of
So that's all it means to take a partial derivative of something: look at what you are taking a derivative with respect to, and only take the derivative of those types of variables. And guess what, this result lines up to what we saw earlier.
We calculated that
Now let's try our rule one more time, this time
So this time with
In this section, we have learned how to think about taking the partial derivative of a function. For the partial derivative, we say we are taking the derivative with respect to a variable. So for example, we can say for the function







