Coordinate descent optimizer (unregularized case) should find optimal weights for the given data

Given matrix X:

 [10.0, 20.0, 30.0]
 [40.0, 50.0, 60.0]
 [70.0, 80.0, 90.0]
 [20.0, 30.0, 10.0]

Given labels vector y:

[20.0, 30.0, 20.0, 40.0]

Put it together:

[10.0, 20.0, 30.0] [20.0] [40.0, 50.0, 60.0] [30.0] [70.0, 80.0, 90.0] [20.0] [20.0, 30.0, 10.0] [40.0]

Given lambda: 0.0 (unregularized case)

Formula for coordinate descent with respect to j column: x_j * (y_i - x_i(-j) * w(-j)),

where x_j - j-th column (e.g., if j = 0 then x_j = [10.0, 40.0, 70.0, 20.0]) y_i - i-th label (e.g., if i = 0 then y_i = 20.0) x_i(-j) - i-th point, where j coordinate is excluded (e.g., if i = 0 then x_i = [10.0, 20.0, 30.0], if i = 0 and j = 0 then x_i(-j) = [0.0, 20.0, 30.0]) w(-j) - coefficients vector or weights vector, j term is excluded

Initial weights: w = [0.0, 0.0, 0.0]

iteration 1: j = 0: j = 1: j = 2: 10 * (20 - (20 * 0 + 30 * 0)) 20 * (20 - (10 * 0 + 30 * 0)) 30 * (20 - (10 * 0 + 20 * 0)) 40 * (30 - (50 * 0 + 60 * 0)) 50 * (30 - (40 * 0 + 60 * 0)) 60 * (30 - (40 * 0 + 50 * 0)) 70 * (20 - (80 * 0 + 90 * 0)) 80 * (20 - (70 * 0 + 90 * 0)) 90 * (20 - (70 * 0 + 80 * 0)) 20 * (40 - (30 * 0 + 10 * 0)) 30 * (40 - (20 * 0 + 10 * 0)) 10 * (40 - (20 * 0 + 30 * 0))

summing up all above (column-wise): 3600 4700 4600

weights at the first iteration: w = [3600, 4700, 4600]

iteration 2: j = 0: j = 1: j = 2: 10 * (20 - (20 * 4700 + 30 * 4600)) 20 * (20 - (10 * 3600 + 30 * 4600)) 30 * (20 - (10 * 3600 + 20 * 4700)) 40 * (30 - (50 * 4700 + 60 * 4600)) 50 * (30 - (40 * 3600 + 60 * 4600)) 60 * (30 - (40 * 3600 + 50 * 4700)) 70 * (20 - (80 * 4700 + 90 * 4600)) 80 * (20 - (70 * 3600 + 90 * 4600)) 90 * (20 - (70 * 3600 + 80 * 4700)) 20 * (40 - (30 * 4700 + 10 * 4600)) 30 * (40 - (20 * 3600 + 10 * 4600)) 10 * (40 - (20 * 3600 + 30 * 4700))

summing up all above (column-wise): -81796400 -81295300 -85285400

weights at the second iteration: w = [-81796400, -81295300, -85285400]

but we cannot get exactly the same vector as above due to fuzzy arithmetic with floating point numbers. In our case we will never get exactly -81295300 (second element of the vector w), since 32-bit floating point number has 24 bits of mantissa precision. 81295300 in binary is 100110110000111011111000100. This requires 25bits of mantissa precision to store precisely, so the binary number 100 (4 in decimal) will be cut off. Thus we should deposit some delta for comparision

To the table of contents

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Coordinate descent optimizer (unregularized case) should find optimal weights for the given data

Given matrix X:

Given labels vector y:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally