- 
          
 - 
                Notifications
    
You must be signed in to change notification settings  - Fork 33
 
Coordinate descent optimizer (unregularized case) should find optimal weights for the given data
 [10.0, 20.0, 30.0]
 [40.0, 50.0, 60.0]
 [70.0, 80.0, 90.0]
 [20.0, 30.0, 10.0][20.0, 30.0, 20.0, 40.0]
Put it together:
[10.0, 20.0, 30.0] [20.0] [40.0, 50.0, 60.0] [30.0] [70.0, 80.0, 90.0] [20.0] [20.0, 30.0, 10.0] [40.0]
Given lambda: 0.0 (unregularized case)
Formula for coordinate descent with respect to j column: x_j * (y_i - x_i(-j) * w(-j)),
where x_j - j-th column (e.g., if j = 0 then x_j = [10.0, 40.0, 70.0, 20.0]) y_i - i-th label (e.g., if i = 0 then y_i = 20.0) x_i(-j) - i-th point, where j coordinate is excluded (e.g., if i = 0 then x_i = [10.0, 20.0, 30.0], if i = 0 and j = 0 then x_i(-j) = [0.0, 20.0, 30.0]) w(-j) - coefficients vector or weights vector, j term is excluded
Initial weights: w = [0.0, 0.0, 0.0]
iteration 1: j = 0: j = 1: j = 2: 10 * (20 - (20 * 0 + 30 * 0)) 20 * (20 - (10 * 0 + 30 * 0)) 30 * (20 - (10 * 0 + 20 * 0)) 40 * (30 - (50 * 0 + 60 * 0)) 50 * (30 - (40 * 0 + 60 * 0)) 60 * (30 - (40 * 0 + 50 * 0)) 70 * (20 - (80 * 0 + 90 * 0)) 80 * (20 - (70 * 0 + 90 * 0)) 90 * (20 - (70 * 0 + 80 * 0)) 20 * (40 - (30 * 0 + 10 * 0)) 30 * (40 - (20 * 0 + 10 * 0)) 10 * (40 - (20 * 0 + 30 * 0))
summing up all above (column-wise): 3600 4700 4600
weights at the first iteration: w = [3600, 4700, 4600]
iteration 2: j = 0: j = 1: j = 2: 10 * (20 - (20 * 4700 + 30 * 4600)) 20 * (20 - (10 * 3600 + 30 * 4600)) 30 * (20 - (10 * 3600 + 20 * 4700)) 40 * (30 - (50 * 4700 + 60 * 4600)) 50 * (30 - (40 * 3600 + 60 * 4600)) 60 * (30 - (40 * 3600 + 50 * 4700)) 70 * (20 - (80 * 4700 + 90 * 4600)) 80 * (20 - (70 * 3600 + 90 * 4600)) 90 * (20 - (70 * 3600 + 80 * 4700)) 20 * (40 - (30 * 4700 + 10 * 4600)) 30 * (40 - (20 * 3600 + 10 * 4600)) 10 * (40 - (20 * 3600 + 30 * 4700))
summing up all above (column-wise): -81796400 -81295300 -85285400
weights at the second iteration: w = [-81796400, -81295300, -85285400]
but we cannot get exactly the same vector as above due to fuzzy arithmetic with floating point numbers. In our case we will never get exactly -81295300 (second element of the vector w), since 32-bit floating point number has 24 bits of mantissa precision. 81295300 in binary is 100110110000111011111000100. This requires 25bits of mantissa precision to store precisely, so the binary number 100 (4 in decimal) will be cut off. Thus we should deposit some delta for comparision