6

This post also contributes to the post at Step by Step Camera Pose Estimation for Visual Tracking and Planar Markers by Jav_Rock (since I cannot add any comment there and I don't know why)

  • It can be seen that the translation vector 1x3 and rotation matrix 3x3 can be derived from homography matrix.
  • However, the following question is: Where are the camera coordinate system and object coordinate system and how are they attached to the camera (or object) ?
  • There is the relative transformation between the two but the computation from homography or transformation matrix implies nothing about these coordinate systems' location & direction

Then, how to solve the pose estimation problem ?

Bilal
  • 167
  • 9
Shawn Le
  • 161
  • 5

1 Answers1

2

The coordinate system you choose is completely arbitrary, as no information about real-world coordinates can be inferred. From an image of a table there is no reason to know that one leg is located at any particular $(X, Y, Z)$, or that it is any particular size (you can't tell if it's a doll's table or a giant's table).

Normally you would choose one of your cameras to be located at the origin, looking down the $z$ axis, defined by the matrix:

$$[R|t] = \begin{bmatrix} 1&0&0&0 \\ 0&1&0&0 \\ 0&0&1&0 \end{bmatrix}$$

Then due to the scale ambiguity you would have to choose an arbitrary scale, for example if you are using a stereo camera you could set the distance between the cameras to be one unit distance.

Chris
  • 368
  • 3
  • 13
  • I would not agree with the idea: from 1 image, pose cannot be derived. According to Zhang's method as I mentioned, the unknown pose & depth problem can be solved. The fact is, in his computation, the camera matrix is included as an additional information – Shawn Le Aug 28 '12 at 10:50