-
Notifications
You must be signed in to change notification settings - Fork 163
Description
Dear authors,
I am using symforce to generate some cost functions, and use them in an optimization problem solved by the ceres solver and our custom solver.
The developing env is Ubuntu 22 + ROS2 Humble.
There are quite a few strange things I encountered.
If you have experience with these, you may hint me how to fix them.
I'd be happy to share my codebase with you if you are interested to take a look.
Meanwhile, I will take time to pinpoint the cause by unit tests as I find time.
The first warning sign is that the optimizer from ceres solver or our solver does not converge for the problem built with sym jacobians. For the sparse normal cholesky solver in ceres solver, the optimized parameters barely change from their initial values at all. Our solver can change these parameters torwards the reference value more or less.
However, the same problem with my analytic Jacobians and my custom parameter types can be solved by the ceres solver and our custom solver with good convergence.
So I check the Jacobians of all cost functions with numeric Jacobians.
A. I found the numeric Jacobians sometimes do not agree with symbolic Jacobians. For a parameter, its numeric Jacobian in some cost functions differs very much from the value given by symforce.
This problem occurs mostly with the rotation part of a pose.
However, the two Jacobians can agree well in some cases of the same type of the cost function.
B. In the same terminal, I got very different results when I ran the same program with absolute path and with relative path.
The case with relative path gives lots of wrong residuals containing inf values.
The case with absolute path looks fine in terms of residuals.
But both cases do not converge.
This erratic behavior has not been observed with my other program that solves the same opt problem.
Now I have doubts about several points.
C. Do we need EIGEN_MAKE_ALIGNED_OPERATOR_NEW in C++20 programs?
D. Since symforce may not generate code properly for if else branching, does it work with for loops which iterates an integer range?
E. Here is an example of a bad residual and its bad Jacobians:
All values look normal except for the residual and Jacobians.
I20250310 11:23:52.678215 3228500 calib_cam_imu.cpp:294] Bad IMU res -364.623 -32.5488 -160.616 -3782.2 3.26517e+132 -0 with interval 0.02
bgba 0 0 0 0 0 0 g_dir 0.0515762 0.844481 -0.533096 gMag 9.8387
I20250310 11:23:52.678242 3228500 calib_cam_imu.cpp:298] 0 pt 0.0211226 -0.681036 -0.731859 0.011227 0.279714 0.354241 0.485652
I20250310 11:23:52.678256 3228500 calib_cam_imu.cpp:298] 1 pt 0.0110132 -0.659783 -0.751188 0.016748 0.312391 0.461406 0.49024
I20250310 11:23:52.678267 3228500 calib_cam_imu.cpp:298] 2 pt 0.00372726 -0.649199 -0.76046 -0.0150726 0.283718 0.443901 0.397507
I20250310 11:23:52.678279 3228500 calib_cam_imu.cpp:298] 3 pt -0.00157705 -0.664078 -0.747651 -0.0041484 0.288184 0.446251 0.491123
I20250310 11:23:52.678293 3228500 calib_cam_imu.cpp:298] 4 pt 0.0144139 -0.666337 -0.745112 0.0244002 0.298574 0.468022 0.647042
I20250310 11:23:52.678305 3228500 calib_cam_imu.cpp:300] lambdas 0.994833 0.0210913 -0.0645651
0.760286 0.314348 -0.158968
0.153628 0.243853 0.290085
3.65713e-09 7.31427e-07 0.000109714
sqrtinfo 441.942 441.942 441.942 25.2538 25.2538 25.2538
meas -0.0464044 -0.16487 -0.0653767 0.6894 6.34896 -7.71682
I20250310 11:23:52.678337 3228500 calib_cam_imu.cpp:176] Checking ImuFactorType<5> Jacs
I20250310 11:23:52.678377 3228500 calib_cam_imu.cpp:194] Mismatch at x0
Numeric: -377.282 8022.24 -7186.1 227.241 0 0 0
-461.355 -14.1113 -4.43716 -407.789 0 0 0
407.396 17.1896 -16.7583 -461.319 0 0 0
-76313.8 6272.25 12840.7 62633 18276 1178.38 123.189
-1.31736e+131 -5.86641e+132 -4.93862e+132 4.0429e+131 2.20721e+129 9.9442e+130 -1.27868e+132
0 0 0 0 0 0 0
Symbolic: -372.854 7879.45 -7339.5 229.594 0 0 0
-461.324 -15.1094 -5.50979 -407.773 0 0 0
407.312 19.9029 -13.8425 -461.364 0 0 0
-76005.9 -3655.86 2171.65 62796.7 18276 1178.38 123.189
-2.92515e+131 -6.82801e+131 6.31833e+131 3.18826e+131 2.20721e+129 9.9442e+130 -1.27868e+132
-0 0 0 0 0 -0 -0
I20250310 11:23:52.678509 3228500 calib_cam_imu.cpp:194] Mismatch at x1
Numeric: 324.125 -7952.77 7261.59 -327.51 0 0 0
449.458 9.62375 9.47219 420.907 0 0 0
-420.578 -20.5008 16.5516 449.2 0 0 0
-13799 -622.437 438.142 11399.1 -18269.1 -1177.94 -123.142
-5.30722e+130 -1.24864e+131 1.13727e+131 5.78905e+130 -2.20637e+129 -9.94044e+130 1.2782e+132
0 0 0 0 0 0 0
Symbolic: 326.433 -8091.06 7104.1 -323.999 0 0 0
449.474 8.65671 8.37117 420.932 0 0 0
-420.622 -17.8722 19.5443 449.133 0 0 0
-13800.3 -542.94 528.649 11397.1 -18269.1 -1177.94 -123.142
-5.30418e+130 -1.2662e+131 1.11727e+131 5.7936e+130 -2.20637e+129 -9.94044e+130 1.2782e+132
-0 0 0 0 -0 0 0
I20250310 11:23:52.678639 3228500 calib_cam_imu.cpp:194] Mismatch at x2
Numeric: 41.587 -2145.6 1881.01 -20.0914 0 0 0
122.185 0.227742 -1.28729 107.986 0 0 0
-107.968 -2.47536 0.876618 122.181 0 0 0
1865.15 38.2883 -16.4881 -1602.36 619854 39966.2 4178.09
7.5049e+129 1.72582e+130 -1.50375e+130 -8.57357e+129 7.486e+130 3.37269e+132 -4.33681e+133
0 0 0 0 0 0 0
Symbolic: 41.7251 -2169.65 1852.83 -20.6499 0 0 0
122.186 0.00627178 -1.54671 107.981 0 0 0
-107.963 -3.32168 -0.114752 122.161 0 0 0
1865.08 50.4837 -2.2024 -1602.08 619854 39966.2 4178.09
7.50347e+129 1.75105e+130 -1.4742e+130 -8.56769e+129 7.486e+130 3.37269e+132 -4.33681e+133
0 0 0 -0 0 -0 -0
I20250310 11:23:52.678768 3228500 calib_cam_imu.cpp:194] Mismatch at x3
Numeric: 174.637 -3083 2810.17 -13.3411 0 0 0
183.74 10.554 0.786954 152.243 0 0 0
-152.122 -2.67192 7.61443 183.864 0 0 0
25993.1 935.219 -1001.37 -22181.7 -619861 -39966.7 -4178.14
9.78762e+130 2.39228e+131 -2.15266e+131 -1.15199e+131 -7.48608e+130 -3.37273e+132 4.33686e+133
0 0 0 0 0 0 0
Symbolic: 174.552 -3118.78 2769.87 -13.5649 0 0 0
183.727 4.89713 -5.5818 152.208 0 0 0
-152.129 -5.62133 4.29378 183.846 0 0 0
25993.3 1053.85 -867.806 -22180.9 -619861 -39966.7 -4178.14
9.78836e+130 2.40823e+131 -2.13471e+131 -1.15186e+131 -7.48608e+130 -3.37273e+132 4.33686e+133
0 -0 0 0 -0 0 0
I20250310 11:23:52.678897 3228500 calib_cam_imu.cpp:194] Mismatch at x4
Numeric: 16.0175 5248.18 -4672.49 -62.9201 0 0 0
-309.033 -3.01554 -0.714374 -256.741 0 0 0
256.721 -2.93402 1.68065 -309.052 0 0 0
-27768.6 -145.714 162.152 23927.6 0 0 0
-1.14804e+131 -2.53403e+131 2.25935e+131 1.29644e+131 0 0 0
0 0 0 0 0 0 0
Symbolic: 16.2595 5236.95 -4685.02 -62.5099 0 0 0
-308.915 -8.46436 -6.80736 -256.542 0 0 0
256.766 -5.02484 -0.657359 -308.975 0 0 0
-27770.9 -39.1951 281.262 23923.7 -0 -0 -0
-1.14828e+131 -2.5206e+131 2.27434e+131 1.29599e+131 -0 -0 0
0 0 -0 0 -0 0 0
I20250310 11:23:52.679016 3228500 calib_cam_imu.cpp:200] Mismatch at bgba
Numeric: 441.942 0 0 0 0 0
0 25.2538 0 0 0 0
0 0 25.2538 0 0 0
0 0 0 25.2538 0 0
0 0 0 0 1.76851e+129 0
0 0 0 0 0 0
Symbolic: 441.942 0 0 0 0 0
0 25.2538 0 0 0 0
0 0 25.2538 0 0 0
0 0 0 25.2538 0 0
0 0 0 0 1.76851e+129 0
0 0 0 0 0 2.42092e-322
I20250310 11:23:52.679114 3228500 calib_cam_imu.cpp:204] Mismatch at g_dir
Numeric: 0 0 0 0
0 0 0 0
0 0 0 0
0.0453093 -1.12838 2.03588 -0.256618
2.92155e+128 -1.76059e+127 -4.32413e+125 -1.11364e+127
0 0 0 0
Symbolic: 0 0 0 0
0 0 0 0
0 0 0 0
-0.150279 -1.11644 2.03588 -0.14848
6.35134e+127 -3.63999e+126 -4.32377e+125 1.15278e+128
0 0 0 0
Updates:
I think B and E are caused by one dependent program is compiled with -march=native whereas the target program is compiled without this flag. After removing this flag for the dependent lib, E disappears.