.. _example_mlab_3D_to_2D:
Mlab 3D to 2D example
--------------------------------------------
A script to calculate the projection of 3D world coordinates to
2D display coordinates (pixel coordinates) for a given scene.
The 2D pixel locations of objects in the image plane are related to their
3D world coordinates by a series of linear transformations. The specific
transformations fall under the group known as projective transformations.
This set includes pure projectivities, affine transformations,
perspective transformations, and euclidean transformations. In the case
of mlab (and most other computer visualization software), we deal with
only the perspective and euclidean cases. An overview of Projective space
can be found here: http://en.wikipedia.org/wiki/Projective_space and a
thorough treatment of projective geometry can be had in the book
"Multiple View Geometry in Computer Vision" by Richard Hartley.
The essential thing to know for this example is that points in 3-space
are related to points in 2-space through a series of multiplications of
4x4 matrices which are the perspective and euclidean transformations. The
4x4 matrices predicate the use of length 4 vectors to represent points.
This representation is known as homogeneous coordinates, and while they
appear foriegn at first, they truly simplify all the mathematics
involved. In short, homogeneous coordinates are your friend, and you
should read about them here:
http://en.wikipedia.org/wiki/Homogeneous_coordinates
In the normal pinhole camera model (the ideal real world model), 3D world
points are related to 2D image points by the matrix termed the
'essential' matrix which is a combination of a perspective transformation
and a euclidean transformation. The perspective transformation is defined
by the camera intrinsics (focal length, imaging sensor offset, etc...)
and the euclidean transformation is defined by the cameras position and
orientation. In computer graphics, things are not so simple. This is
because computer graphics have the benefit of being able to do things
which are not possible in the real world: adding clipping planes, offset
projection centers, arbitrary distortions, etc... Thus, a slightly
different model is used.
What follows is the camera/view model for OpenGL and thus, VTK. I can not
guarantee that other packages follow this model.
There are 4 different transformations that are applied 3D world
coordinates to map them to 2D pixel coordinates. They are: the model
transform, the view transform, the perspective transform, and the
viewport or display transform.
In OpenGL the first two transformations are concatenated to yield the
modelview transform (called simply the view transform in VTK). The
modelview transformation applies arbitrary scaling and distortions to the
model (if they are specified) and transforms them so that the orientation
is the equivalent of looking down the negative Z axis. Imagine its as if
you relocate your camera to look down the negative Z axis, and then move
everything in the world so that you see it now as you did before you
moved the camera. The resulting coordinates are termed "eye" coordinates
in OpenGL (I don't know that they have a name in VTK).
The perspective transformation applies the camera perspective to the eye
coordinates. This transform is what makes objects in the foreground look
bigger than equivalent objects in the background. In the pinhole camera
model, this transform is determined uniquely by the focal length of the
camera and its position in 3-space. In Vtk/OpenGL it is determined by the
frustum. A frustum is simply a pyramid with the top lopped off. The top
of the pyramid (a point) is the camera location, the base of the pyramid
is a plane (the far clipping plane) defined as normal to principle camera
ray at distance termed the far clipping distance, the top of the frustum
(where it's lopped off) is the near clipping plane, with a definition
similar to that of the far clipping plane. The sides of the frustum are
determined by the aspect ratio of the camera (width/height) and its
field-of-view. Any points not lying within the frustum are not mapped to
the screen (as they would lie outside the viewable area). The
perspective transformation has the effect of scaling everything within
the frustum to fit within a cube defined in the range (-1,1)(-1,1)(-1,1)
as represented by homogeneous coordinates. The last phrase there is
important, the first 3 coordinates will not, in general, be within the
unity range until we divide through by the last coordinate (See the
wikipedia on homogeneous coordinates if this is confusing). The resulting
coordinates are termed (appropriately enough) normalized view
coordinates.
The last transformation (the viewport transformation) takes us from
normalized view coordinates to display coordinates. At this point, you
may be asking yourself 'why not just go directly to display coordinates,
why need normalized view coordinates at all?', the answer is that we may
want to embed more than one view in a particular window, there will
therefore be different transformations to take each view to an
appropriate position an size in the window. The normalized view
coordinates provide a nice common ground so-to-speak. At any rate, the
viewport transformation simply scales and translates the X and Y
coordinates of the normalized view coordinates to the appropriate pixel
coordinates. We don't use the Z value in our example because we don't
care about it. It is used for other various things however.
That's all there is to it, pretty simple right? Right. Here is an overview:
Given a set of 3D world coordinates:
- Apply the modelview transformation (view transform in VTK) to get eye
coordinates
- Apply the perspective transformation to get normalized view coordinates
- Apply the viewport transformation to get display coordinates
VTK provides a nice method to retrieve a 4x4 matrix that combines the
first two operations. As far as I can tell, VTK does not export a method
to retrieve the 4x4 matrix representing the viewport transformation, so
we are on our there to create one (no worries though, its not hard, as
you will see).
Now that the prelimenaries are out of the way, lets get started.
**Python source code:** :download:`mlab_3D_to_2D.py`
.. literalinclude:: mlab_3D_to_2D.py
:lines: 109-