Calibration algorithm

Let $x_p, y_p$ be the physical coordinates of a point in the recording area in mm; let $x_t, y_t$ be the positions of the translation stages in mm; let $r_b, c_b$ be the row and column coordinates of the aforementioned point as captured by the behavior camera in pixels; let $r_m, c_m$ be the row and column coordinates of the aforementioned point as captured by the muscle camera in pixels.

Then the calibration step we want to establish the following mappings:

Mapping from stage position and pixel position to physical position...
1. ... for the behavior camera $f_b:\ (x_t, y_t, r_b, c_b) \rightarrow (x_p, y_p)$
2. ... for the muscle camera $f_m:\ (x_t, y_t, r_m, c_m) \rightarrow (x_p, y_p)$
Mapping from stage position and physical position to pixel position...
1. ... for the behavior camera $g_b:\ (x_t, y_t, x_p, y_p) \rightarrow (r_b, c_b)$
2. ... for the muscle camera $g_m:\ (x_t, y_t, x_m, y_m) \rightarrow (r_m, c_m)$

Our strategy for fitting these mappings is:

Print a board with small ArUco markers such that (i) the board has the same size as the arena, and (ii) the corner positions of each marker is known (in physical coordinates $x_p, y_p$ in mm).
Place the calibration board over the arena. Move the motion stages on a grid covering the entire arena, and take a picture using both cameras. In other words, the motion should travel to each of the following point and take a brief pause: (x_t, y_t) for x_t in range(x_min, x_max + stride, stride) for y_t in range(y_min, y_max + stride, stride). x_min, x_max, y_min, y_max are in mm and are defined conservatively (e.g. x_min should be slightly smaller than the x value at the actual arena boundary), and stride can be a small value such as 2 mm. Let the number of points thus generated be $k$. When the stages are stopped at one of said points, each camera should take a picture, and the current $(x_t, y_t)$ position should be recorded.
For stage position defined above, for each camera, we detect all ArUco markers in the image. This will give us a set of detected ArUco codes each containing four corner positions in row-column coordinates $(r_b, c_b)$ or $(r_m, c_m)$. Because we know which marker it is (by identifying the encoded marker ID), we can query the physical coordinates $(x_p, y_p)$ of each of the four points (see step 1). As we also know the current stage position $(x_t, y_t)$, we can generate a set of 6-tuples $S_b = \{(x_p, y_p, x_t, y_t, r_b, c_b)\}$ for the behavior camera and a set of 6-tuples $S_m = \{(x_p, y_p, x_t, y_t, r_m, c_m)\}$ for the muscle camera. Note that because the number of markers detected from the behavior and muscle camera images might not be the same, the size of $S_b$ and $S_m$ are not generally the same. If $m$ markers are detected in the behavior camera image and $n$ in the muscle camera image, then $|S_b| = 4m$ and $|S_m| = 4n$. Repeating this procedure for each of the $k$ stage positions, we have $k$ sets $S_b^{(1)}, S_b^{(2)}, \dots, S_b^{(k)}$ and $k$ sets $S_m^{(1)}, S_m^{(2)}, \dots, S_m^{(k)}$. We concatenate them into $S_b^{\star}$ and $S_m^{\star}$ respectively. These are the data that we shall fit the mappings from.
We assume each of the mapping is affine linear. In other words, take $f_b:\ (x_t, y_t, r_b, c_b) \rightarrow (x_p, y_p)$ for example, we fit $A_{f_b}\in\mathcal{R}^{2\times5}$ such that

$$\begin{pmatrix} x_p \\ y_p \end{pmatrix} = A_{f_b} \begin{pmatrix} x_t \\ y_t \\ r_b \\ c_b \\ 1 \end{pmatrix}$$

In practice, we fit two linear regression models: $x_p = u_0 + u_1 x_t + u_2 y_t + u_3 r_b + u_4 c_b$ and $y_p = v_0 + v_1 x_t + v_2 y_t + v_3 r_b + v_4 c_b$ using $S_b^{\star}$ and $S_m^{\star}$ as training data. Simple least-square linear regression alone is not robust against outliers (e.g. from inaccuracies in ArUco code detection). Therefore, we will use RANSAC to remove outliers and refit the linear models. We perform some sanity tests to ensure that the fit is reasonable (i.e. $r^2$ should be almost 1 and the RMSE should be on the order of 0.03 mm) and the RANSAC algorithm is correctly attached (i.e. not too many points are excluded as outliers).
As the models are linear, it is only necessary to fit $f_b$ and $f_m$. $g_b$ and $g_m$ can be derived by partially inverting the weight-and-bias matrices $A$ (see implementation here).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calibration algorithm

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally