Introduction
Recently, I have been working on a volume renderer (raycasting). The task is to display volume data such as that recorded by computer tomography methods on a 2D screen. The data is structured in a 3D grid and fills a cuboid, in which each voxel is associated with a data value.
Now this volume is projected into 2D, and in order for the observer to get a "feeling" for the data, it is crucial to provide means for rotating the volume at will. Rotating a three-dimensional volume with a (2D) mouse for example is not trivial to do in an intuitive way.
At the beginning, although my program works satisfactory, I was unhappy because in order to make the rotation and everything work as expected, I had to swap coordinate signs and transformation matrices in a trial-and-error fashion.
The following text describes the theory behind the coordinate transformations necessary for this task. In contrast to the first assumption that there are two coordinate systems - one for the volume and one for the camera - which are rotated to each other, I came to the conclusion that my program actually employs no less than six coordinate systems. Furthermore, these coordinate systems are not only rotated and translated to each other, but other important properties are their handedness (right-handed vs. left-handed), the direction of the axes (for example, the mathematical y-axis points up, whereas in screen coordinates, it usually points down) and their resolution (i.e. the necessary transformations include scaling).
How Many Coordinate Systems?
The first coordinate system is already given by the volume data, which is stored in a three-dimensional array and indexed with three indices x,y and z. I call this the volume system.
The volume will be rotated in order to give a nice view, however the rotation center should not be the origin of the volume system, since that'd be a corner of the cuboid (the voxel with the lowest memory address). That's why we translate the object, so that it's midpoint is in the origin to get the object system. (Instead of this fixed rotation center, we can allow the user to move it to a different point of interest.)
Furthermore, we can accomodate for anisotropic data at this point (for example, CT data usually has a higher resolution within a slice than between slices). Also, we can fix the direction of the axes of the object system, and align the data correctly if we have information about the original 3D axes of the volume (which is contained in some special volume image formats).
The next important transformation would be that into the world system, where we take the orientation of the volume object into account. One could also translate the volume here, but I see no reason to do so for volume rendering (we want to keep the rotation center in the view center). Actually, ATM I do not even rotate the volume object within the world system so both systems are identical.
The camera system is the next coordinate system. At first, it might seem as if it could be neglected (seen as identical to the world system), since at the moment, we have only one object in our world (centered at the origin), and there is no reason to move the camera at all. We put the camera onto the Z axis, and since the projection will later throw the Z coordinate away, we don't need to translate or rotate at this point.
If the whole scene is to be rotated, not only the volume object, this is the right place to do the rotation (this is the point where I currently do it).
At this point, we do the projection from 3D to two dimensions, throwing away the Z coordinate. I call the resulting system the viewing plane system. Furthermore, I use the camera-to-viewing-plane transformation for zooming (which is easily accomplished with uniform scaling).
My VolumeRendererBase class offers a method to calculate the bounding box of the projected volume in the viewing plane, which gives the view component the possibility to query not only the size of the projected image, but also to align the rotation center (which is in the origin of the view system) within its display.
Next, we have a transformation to an image system, supporting another feature by scaling at this point: Screens usually do not have the exact same resolution in both axes, and by scaling for example the Y axis to match the X axis' DPI, we can display the objects contained in the volume in their original proportions (if we have enough information to do anisotropic scaling and alignment as mentioned above for the object system).
Finally, the viewer component translates the origin (rotation center) to the middle of the viewer-GUI-widget. Actually, GUI toolkits allow their users to specify positions local to each widget, so there is one coordinate system for each widget.
I mention this, because one might want to separate the viewing widget from navigation components which e.g. allow to rotate the object with the mouse. In my program, the viewer is a child widget of a "trackball widget", which simulates a trackball (it could as well be the other way round) so that the widgets are on top of each other; and one should make sure that the trackball center is coincident with the displayed rotation center (i.e. the origin of the view system).
Actually, Ulli pointed me to an additional coordinate system for depth. First, I implemented several projection functors which used the current depth in their projection formula, e.g. for depth cueing (making the volume darker to the back side) in otherwise symmetric projection formulas (like the usual MIP, sum, ...). As a straight-forward solution, I used the Z coordinate in the camery system as depth - this posed the question where to put the camera plane at? Ullis proposition was to put the camera plane into the middle of the volume; I prefer to have it somewhere in front instead. The outcome of our discussion (in which Ulli complained that my first approach made a parametrization of the depth attenuation depend on the volume size) was to
- normalize the depth coordinate system, i.e. let the depth be in the range -1..1 independent from the current volume's size and orientation (this simplifies the projection functors), and to
- use a bounding sphere (centered in the rotation center) as a reference for the depth coordinates.
The bounding sphere has the purpose of making the depth rotation-invariant while assigning depth 0 to the rotation center and the maximum depth magnitude of 1 to at least one point in one orientation.
Axis Alignment
Finally, let me discuss the handedness/axis-direction issue: The only coordinate systems which are fixed are the ones at both ends.
- The volume system uses increasing "coordinates" as indices into the data array, where the exact direction of the axes in the real world depends on the data source, and can be taken into account when transforming into the object system if known.
- The widget system is fixed, too and usually has the X axis pointing right, and the Y axis pointing down (sometimes called "screen orientation" as opposed to the mathematical orientation). This also determines the Y direction in the image system.
The other coordinate systems can be aligned freely. First, I chose to let the X axis point right in all systems.
The volume alignment is already determined through an alignment vector used for the transformation between volume system and object system. I chose to let the unit vector default to a right-handed system, where Y points down and Z to the back, so that loading a stack of images results in the images being rendered in the same orientation as they appear in an image viewer, in front-to-back order.
Conclusion
I have implemented a function coordTransform(c, ss, ts) which transforms a coordinate vector c from eg. the source system ss to the target system ts.
Moreover, the function has an additional optional paramter which determines whether translations should be applied. This is because one does not only want to transform position coordinates, but also direction vectors. These can be transformed using the same scaling and rotation transformations, but leaving any translation out.
Note: If you're using homogenous coordinates, you can distinguish direction- from position-vectors by their last (augmented) coordinate. If you augment the original vector with 1 (as usual), the transformations will include the translating parts. However, if the last coordinate is zero/ignored, the first three dimensions will be transformed in the same way as without homogenous coordinates/translations.