-
Notifications
You must be signed in to change notification settings - Fork 0
Calibration algorithm
Sibo Wang-Chen edited this page May 6, 2025
·
3 revisions
Let
Then the calibration step we want to establish the following mappings:
- Mapping from stage position and pixel position to physical position...
- ... for the behavior camera
$f_b:\ (x_t, y_t, r_b, c_b) \rightarrow (x_p, y_p)$ - ... for the muscle camera
$f_m:\ (x_t, y_t, r_m, c_m) \rightarrow (x_p, y_p)$
- ... for the behavior camera
- Mapping from stage position and physical position to pixel position...
- ... for the behavior camera
$g_b:\ (x_t, y_t, x_p, y_p) \rightarrow (r_b, c_b)$ - ... for the muscle camera
$g_m:\ (x_t, y_t, x_m, y_m) \rightarrow (r_m, c_m)$
- ... for the behavior camera
Our strategy for fitting these mappings is:
- Print a board with small ArUco markers such that (i) the board has the same size as the arena, and (ii) the corner positions of each marker is known (in physical coordinates
$x_p, y_p$ in mm). - Place the calibration board over the arena. Move the motion stages on a grid covering the entire arena, and take a picture using both cameras. In other words, the motion should travel to each of the following point and take a brief pause:
(x_t, y_t) for x_t in range(x_min, x_max + stride, stride) for y_t in range(y_min, y_max + stride, stride).x_min, x_max, y_min, y_maxare in mm and are defined conservatively (e.g.x_minshould be slightly smaller than the x value at the actual arena boundary), andstridecan be a small value such as 2 mm. Let the number of points thus generated be$k$ . When the stages are stopped at one of said points, each camera should take a picture, and the current$(x_t, y_t)$ position should be recorded. - For stage position defined above, for each camera, we detect all ArUco markers in the image. This will give us a set of detected ArUco codes each containing four corner positions in row-column coordinates
$(r_b, c_b)$ or$(r_m, c_m)$ . Because we know which marker it is (by identifying the encoded marker ID), we can query the physical coordinates$(x_p, y_p)$ of each of the four points (see step 1). As we also know the current stage position$(x_t, y_t)$ , we can generate a set of 6-tuples$S_b = \{(x_p, y_p, x_t, y_t, r_b, c_b)\}$ for the behavior camera and a set of 6-tuples$S_m = \{(x_p, y_p, x_t, y_t, r_m, c_m)\}$ for the muscle camera. Note that because the number of markers detected from the behavior and muscle camera images might not be the same, the size of$S_b$ and$S_m$ are not generally the same. If$m$ markers are detected in the behavior camera image and$n$ in the muscle camera image, then$|S_b| = 4m$ and$|S_m| = 4n$ . Repeating this procedure for each of the$k$ stage positions, we have$k$ sets$S_b^{(1)}, S_b^{(2)}, \dots, S_b^{(k)}$ and$k$ sets$S_m^{(1)}, S_m^{(2)}, \dots, S_m^{(k)}$ . We concatenate them into$S_b^{\star}$ and$S_m^{\star}$ respectively. These are the data that we shall fit the mappings from. - We assume each of the mapping is affine linear. In other words, take
$f_b:\ (x_t, y_t, r_b, c_b) \rightarrow (x_p, y_p)$ for example, we fit$A_{f_b}\in\mathcal{R}^{2\times5}$ such that
- In practice, we fit two linear regression models:
$x_p = u_0 + u_1 x_t + u_2 y_t + u_3 r_b + u_4 c_b$ and$y_p = v_0 + v_1 x_t + v_2 y_t + v_3 r_b + v_4 c_b$ using$S_b^{\star}$ and$S_m^{\star}$ as training data. Simple least-square linear regression alone is not robust against outliers (e.g. from inaccuracies in ArUco code detection). Therefore, we will use RANSAC to remove outliers and refit the linear models. We perform some sanity tests to ensure that the fit is reasonable (i.e.$r^2$ should be almost 1 and the RMSE should be on the order of 0.03 mm) and the RANSAC algorithm is correctly attached (i.e. not too many points are excluded as outliers). - As the models are linear, it is only necessary to fit
$f_b$ and$f_m$ .$g_b$ and$g_m$ can be derived by partially inverting the weight-and-bias matrices$A$ (see implementation here).