-
-
Notifications
You must be signed in to change notification settings - Fork 33
Coordinate descent optimizer (unregularized case) should find optimal weights for the given data
Ilya Gyrdymov edited this page Feb 16, 2019
·
3 revisions
[10.0, 20.0, 30.0]
[40.0, 50.0, 60.0]
[70.0, 80.0, 90.0]
[20.0, 30.0, 10.0][20.0, 30.0, 20.0, 40.0][10.0, 20.0, 30.0] [20.0]
[40.0, 50.0, 60.0] [30.0]
[70.0, 80.0, 90.0] [20.0]
[20.0, 30.0, 10.0] [40.0]Formula for coordinate descent with respect to j column:
xij * (yi - xi,-j * w-j),
where
- xij - j-th component on i-row (e.g., if j = 0 and xi = [10.0, 40.0, 70.0, 20.0] then xij is 10)
- yi - i-th label (e.g., if i = 0 then yi = 20.0)
- xi,-j - i-th vector, where j coordinate is excluded (e.g., if i = 0 then xi = [10.0, 20.0, 30.0], if i = 0 and j = 0 then xi,-j = [20.0, 30.0])
- w-j - coefficients vector or weights vector, where j term is excluded
w = [0.0, 0.0, 0.0]j = 0: j = 1: j = 2:
10 * (20 - (20 * 0 + 30 * 0)) 20 * (20 - (10 * 0 + 30 * 0)) 30 * (20 - (10 * 0 + 20 * 0))
40 * (30 - (50 * 0 + 60 * 0)) 50 * (30 - (40 * 0 + 60 * 0)) 60 * (30 - (40 * 0 + 50 * 0))
70 * (20 - (80 * 0 + 90 * 0)) 80 * (20 - (70 * 0 + 90 * 0)) 90 * (20 - (70 * 0 + 80 * 0))
20 * (40 - (30 * 0 + 10 * 0)) 30 * (40 - (20 * 0 + 10 * 0)) 10 * (40 - (20 * 0 + 30 * 0))summing up all above (column-wise), we get:
3600, 4700, 4600so the weights at the first iteration are:
w = [3600, 4700, 4600]j = 0: j = 1: j = 2:
10 * (20 - (20 * 4700 + 30 * 4600)) 20 * (20 - (10 * 3600 + 30 * 4600)) 30 * (20 - (10 * 3600 + 20 * 4700))
40 * (30 - (50 * 4700 + 60 * 4600)) 50 * (30 - (40 * 3600 + 60 * 4600)) 60 * (30 - (40 * 3600 + 50 * 4700))
70 * (20 - (80 * 4700 + 90 * 4600)) 80 * (20 - (70 * 3600 + 90 * 4600)) 90 * (20 - (70 * 3600 + 80 * 4700))
20 * (40 - (30 * 4700 + 10 * 4600)) 30 * (40 - (20 * 3600 + 10 * 4600)) 10 * (40 - (20 * 3600 + 30 * 4700))summing up all above (column-wise):
-81796400 -81295300 -85285400so the weights at the second iteration:
w = [-81796400, -81295300, -85285400]But we cannot get exactly the same vector as above due to fuzzy arithmetic with floating point numbers. In our case
we will never get exactly -81295300 (second element of the vector w), since 32-bit floating point number has 24 bits
of mantissa precision. 81295300 in binary is 100110110000111011111000100. This requires 25bits of mantissa
precision to store precisely, so the binary number 100 (4 in decimal) will be cut off. Thus we should deposit
some delta for comparison