Estimating Plane Pose Without Knowledge of Intrinsic Camera Parameters

Question

Assume a camera has no skew, and has square pixels, that is, its camera-calibration matrix, $K$, looks like: $$ K = \left[\begin{matrix} \alpha&0&u_x\\0&\alpha&u_y\\0&0&1 \end{matrix}\right] $$

Intuitively, it seems that it should be possible to recover the pose of a rectangular planar object, $P$, (for example a flat piece of paper) by only knowing the projected image coordinates $\vec{p_i}$ for $i={1, 2, 3, 4}$ of the object's four corners without knowledge of $K$. To be more precise, I only care about the orientation of the unit normal vector, $\vec{n}$, of $P$, not its location.

Following the answer given in response to this question: Step by Step Camera Pose Estimation for Visual Tracking and Planar Markers,

We choose world-coordinates such that $P$ is the plane $Z=0$. We find the homography, $H$, mapping the four-corners of $P$ in world-frame coordinates to its projected coordinates $\vec{p_i}$ for $i=1,2,3,4$.

Then, $H = K[R | \vec{t}]$. Where $R \in SO(3)$ is the camera's rotation and $t \in \mathbb{R}^3$ is the camera's translation. If we make the assumption that $K$ is identity (I'll return to this later), then $H = [\vec{R_1}|\vec{R_2}|\vec{t}]$. Where $\vec{R_1}, \vec{R_2}$ are the first and second columns of $R$. The third column of $R$, $\vec{R_3}$, is equal to $\vec{R_1}\times\vec{R_2}$. Finally, $\vec{n} = R\left[\begin{matrix}0\\0\\1\end{matrix}\right] = \vec{R_1}\times\vec{R_2}$ and we are done.

Now let's examine the effect of $K$ on our answer $\vec{n}$. My intuition tells me that $K$ shouldn't matter because varying $\alpha$ will ultimately act as a scalar multiple on our answer $\vec{n}$ which is normalized out since I am only concerned with the direction of $\vec{n}$, not its magnitude. In addition, it seems that varying $u_x, u_y$ should have the effect of translating the locations of the projected corners $\vec{p_i}$, which should affect $\vec{t}$, but not $\vec{n}$.

Let's see if this is true. If $K \neq I$, then $H = K[\vec{R_1}|\vec{R_2}|\vec{t}]$ and $$\begin{equation} \vec{n} = K \vec{R_1} \times K \vec{R_2} = det(K)K^{-T}(\vec{R_1}\times\vec{R_2})\end{equation}$$

In particular, if $\alpha \neq 1$, $$\vec{n} = \alpha(\vec{R_1} \times \vec{R_2})$$ which is our answer from above times a scalar as expected.

On the other hand, if $u_x, u_y \neq 0$ then $$det(K)K^{-T} = \left[\begin{matrix} 1&0&0\\0&1&0\\-u_x&-u_y&1 \end{matrix}\right]$$

and $$\vec{n} = \left[\begin{matrix} 1&0&0\\0&1&0\\-u_x&-u_y&1 \end{matrix}\right](\vec{R_1} \times \vec{R_2})$$

In other words, our choice of camera center has a large effect on our calculations. This is not intuitive to me. Isn't our choice of camera center, $u_x, u_y$ completely arbitrary (for example $[0, 0]$ vs. $[w/2, h/2]$) or is there something I am missing?

Besides tbirdal's answer:
"Intuitively, it seems that it should be possible to recover the pose of a rectangular planar object, P, (for example a flat piece of paper) by only knowing the projected image coordinates."

Do you mean from a single shot? How can you do this if a point in image plane is the locus of infinite 3D points, and you have no other kind of info? — David, Feb 20 '15 at 13:26
@David, I think I managed to capture the intent of your comment while converting it--sorry if I munged something. — datageist, Feb 22 '15 at 18:53

score 3 · Answer 1 · answered Dec 04 '20 at 08:53

I think there is a mistake in this part:

\begin{equation} \vec{n} = K \vec{R_1} \times K \vec{R_2} = det(K)K^{-T}(\vec{R_1}\times\vec{R_2})\end{equation} In particular, if $\alpha \neq 1$ \begin{equation} \vec{n} = \alpha(\vec{R_1} \times \vec{R_2}) \end{equation} which is our answer from above times a scalar as expected.

If you assume:

\begin{equation} K = \left[\begin{matrix} \alpha&0&0\\0&\alpha&0\\0&0&1 \end{matrix}\right] \end{equation} then

\begin{equation} det(K)K^{-T} = \alpha^2 \left[\begin{matrix} 1/\alpha&0&0\\0&1/\alpha&0\\0&0&1 \end{matrix}\right] \neq \alpha \end{equation}

\begin{equation} \vec{n} = \left[\begin{matrix} \alpha&0&0\\0&\alpha&0\\0&0&\alpha^2 \end{matrix}\right] (\vec{R_1} \times \vec{R_2}) \end{equation}

So, the focal length $\alpha$ has an impact on the direction of the normal $\vec{n}$.

Tolga Birdal · Answer 2 · 2015-06-20T20:45:47.870

2

I think there is a mistake in those calculations.

How can you assume that $K$ has only a scaling effect? The principal point is integrated in $K$, it's not just a matrix of focal lengths.
How come is this true?

$$det(K)K^{-T} = \left[\begin{matrix} 1&0&0\\0&1&0\\-u_x&-u_y&1 \end{matrix}\right]$$

You don't end up with such equal numbers in the diagonal if you compute $det(K)K^{-T}$.

To me, your question is a bit unclear and I guess some assumptions are fishy.

edited Jun 20 '15 at 20:45

answered May 26 '14 at 08:41

Tolga Birdal

5,465
1
16
40

1
I don't assume $K$ only has a scaling effect. I broke my analysis into two cases: (a) $K$ only has a scaling effect (b) $K$ has a scaling effect and a non-zero principal point. I show that in the case of (a), given only the 2D coordinates of the four coordinates of a plane, you can recover the plane's normal vector, i.e. you don't need to know the camera's calibration matrix. In the case of (b) this is not true, which is un-intuitive to me and the purpose of the question.

Joey

Aug 06 '14 at 15:25

With regards to 2. You're right, I have a mistake in my math somewhere. Not sure what went wrong there. I'll have to look at what went wrong when I have more time. Thanks. – Joey Aug 06 '14 at 15:32

Estimating Plane Pose Without Knowledge of Intrinsic Camera Parameters

2 Answers2

Linked