Questions About Tutorial 5 #43
-
Hello everyone, I am learning the CoDiPack software following the tutorial. While studying Tutorial 5, I encountered some confusion. After executing runcodi::RealReverse();, in Step 1, the output is: Update global: I can understand this result because f(x) = x^3 (x + x^2),so when x = 4.0, f(4.0) = 1280 However, in Step 2, something changes. First, for y = func(x, false, global), the input parameter false means that the global variable is not updated. Additionally, between Step 1 and Step 2, there is no code resetting the global variable. In this case, the output is: For f(4.0) = 1280, I understand this because the global variable is not reset, so the global value from Step 1 is reused, where global = t * x. Thus, it is natural that f(4.0) = 1280. However, I don't understand how df/dx(4.0) = 756 is derived. I read the explanation in the tutorial, which states: I cannot figure out how this value of 756 is obtained. Even if it is incorrect, what might the process be to arrive at this result? In Step 3, there is a further change compared to Step 2. Before registering x, the command tape.deactivateValue(global); is executed. This command sets the identifier of the global variable to zero. The result is: No update global with reset: Similarly, I don't understand why df/dx (4.0)=576. How is this 576 derived? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Hi, thanks for looking at CoDiPack. I will focus on step 3 first:
Step2: In this special case:
For step 2:
So the return statement looks for CoDiPack for the reverse run something like I hope this helps in understanding the problem. |
Beta Was this translation helpful? Give feedback.
-
Hi MaxSagebaum, I’m very glad to receive your reply, and it greatly facilitated my understanding of this issue. First of all, please forgive me for the delayed response (yesterday, I got caught up with some things). Based on my debugging work, I think I’ve figured out what Steps 1, 2, and 3 in Tutorial 5 are doing: CoDiPack should be using an operator overloading strategy to compute derivatives, so naturally, I started by drawing the computational graph, as shown in Figure 1. The forward process is quite simple, and can be solved sequentially from left to right. However, in order to achieve the backward solve, it is necessary to record certain information during the forward computation. (I’m not entirely sure how CoDiPack implements this, but it should be similar to my understanding.) The program likely saves something like this for each operator: Tape = [inputs = [self.name], outputs = [x.name], propagate = propagate], where self.name and x.name should correspond to the variable IDs in CoDiPack, and propagate records the overloaded method for the operator. Now, let's take a look at what Step 1 is doing: According to the chain rule: Let's skip Step 2 for now and take a look at Step 3: For Step 3, due to the statement tape.deactivateValue(global);, the variable Global's IDs are set to zero in CoDiPack. From your previous reply, I understand that if a variable's IDs are zero, it is treated as a constant in derivative calculations. I’m not entirely sure what this means in the computational graph, but I assume it looks like Figure 2. Since the forward process is still 1280, the forward computation should be unaffected. However, in the backward propagation, since Global (represented as v3 in the figure) is no longer involved, we assume that all the dashed lines passing through v3 are cut off. Let's proceed with the calculation using the chain rule: Now let's look at Step 2. Since the variable global (v3) and the variable t2 (v4) have the same identifier, during the backward propagation of v3 and v4, the overloaded operator could be either addition or multiplication. Let's assume that at this point, global no longer overloads the multiplication operator, but instead overloads the addition operator. Let’s see what happens: Since I’m not very familiar with CoDiPack, I can’t be sure my thoughts are entirely correct. I can only say that they happen to align with the inputs (by the way, I feel like I’m probably right ;) ). |
Beta Was this translation helpful? Give feedback.
Hi, thanks for looking at CoDiPack.
I will focus on step 3 first:
As you already said,
global
is deactivated beforhand and not updated. Therefore,global
is a constant with respect to the differentiation. The value ofglobal
isx * t = x*x*x = 64
withx=4
. It got this value during step one.f
is nowf = global * t2 = global * (x + t) = global * (x + x^2)
. Since global has the same value we get the same result.df/dx
is now different since we no longer see the dependency ofglobal
with respect tox
.df/dx = global + 2 * global * x
. This withx=4
andglobal=64
we get576
.Step2:
It is very tricky do properly answer step 2. The whole tutorial is about sensibilizing the user for exactly this …