The image-based approach presented in this paper addresses
the simulation of a virtual camera's motions in photographic or
computer synthesized spaces. The camera's motions have six
degrees of freedom. The degrees of freedom are grouped in three
classes. First, the three rotational degrees of freedom, termed
"camera rotation", refer to rotating the camera's view direction
while keeping the viewpoint stationary. This class of motions
can be accomplished with the reprojection of an environment
map and image rotation. Second, orbiting a camera about an
object while keeping the view direction centered at the object is
termed "object rotation" because it is equivalent to rotating the
object. This type of motion requires the movement of the
viewpoint and can not be achieved with an environment map.
Third, free motion of the camera in a space, termed "camera
movement", requires the change of both the viewpoint and the
viewing direction and has all six degrees of freedom. In addition
to the above motions, changing the camera's field-of-view,
termed "camera zooming", can be accomplished through
multiple resolution image zooming.
Without loss of generality, the environment is assumed to
be static in the following discussions. However, one can
generalize this method to include motions via the use of time
varying environment maps, such as environment map movies
A camera has three rotational degrees of freedom: pitch
(pivoting about a horizontal axis), yaw (pivoting about a
vertical axis) and roll (rotating about an axis normal to the
view plane). Camera rolling can be achieved trivially with an
image rotation. Pitch and yaw can be accomplished by the
reprojection of an environment map.
An environment map is a projection of a scene onto a
simple shape. Typically, this shape is a cube [8] or a sphere
[6], [7]. Reprojecting an environment map to create a novel
view is dependent on the type of the environment map. For a
cubic environment map, the reprojection is merely displaying
the visible regions of six texture mapped squares in the view
plane. For a spherical environment map, non-linear image
warping needs to be performed. Figure 1 shows the reprojection
of the two environment maps.
If a complete 360 degree panning is not required, other
types of environment maps such as cylindrical, fish-eye or
wide-angled planar maps can be used. A cylindrical map allows
360-degree panning horizontally and less than 180-degree
panning vertically. A fish-eye or hemi-spherical map allows
180-degree panning in both directions. A planar map allows
less than 180-degree panning in both directions.
Figure 1. Reprojecting a cubic and a spherical
As mentioned earlier, orbiting the camera around an object,
equivalent to rotating the object about its center, can not be
accomplished simply with an environment map. One way of
solving this problem is the navigable movie approach. The
movie contains frames which correspond to all the allowable
orientations of an object. For an object with full 360-degree
rotation in one direction and 140 degrees in another direction at
10 degree increments, the movie requires 504 frames. If we
store the frames at 256 by 256 pixel resolution, each frame is
around 10K bytes after compression. The entire movie
consumes roughly 5 MB of storage space. This amount of space
is large but not impractical given the current capacity of
approximately 650 MB per CD-ROM.
The view interpolation approach [18] needs to store only a
few key views of an object. The new views are interpolated on
the-fly from the key views, which also means the rotation
A camera moving freely in a scene involves the change of
viewpoint and view direction. The view direction change can be
accomplished with the use of an environment map. The
viewpoint change is more difficult to achieve.
A simple solution to viewpoint change is to constrain the
camera's movement to only particular locations where
environment maps are available. For a linear camera
movement, such as walking down a hallway, environment maps
can be created for points along the path at some small
intervals. The cost of storing the environment maps is roughly
six times the cost of storing a normal walkthrough movie if a
cubic map is used. The resulting effects are like looking out of a
window from a moving train. The movement path is fixed but
the passenger is free to look around. Environment map movies
are similar to some special format movies such as Omnimax
(180 degree fish-eye) or CircleVision (360-degree cylindrical)
movies, in which a wider than normal field-of-view is recorded.
The observer can control the viewing direction during the
For traversing in a 2D or 3D space, environment maps can
be arranged to form a 2D or 3D lattice. Viewpoints in space are
simply quantized to the nearest grid point to approximate the
motion (figure 2). However, this approach requires a larger
number of environment maps to be stored in order to obtain
smooth motion. A more desirable approach may be the view
interpolation method [18] or the approximate visibility
method [12], which generates new views from a coarse grid of
environment maps. Instead of constraining the movement to
the grid points, the nearby environment maps are interpolated
to generate a smooth path.
Figure 2. An unconstrained camera path and an
approximated path along the grid lines.
Changing the camera's field of view is equivalent to
zooming in and out in the image space. However, using image
magnification to zoom in does not provide more detail.
Zooming out through image reduction may create aliasing
artifacts as the sampling rate falls below the Nyquist limit. One
solution is multiple resolution image zooming. A pyramidal or
quadtree-like structure is created for each image to provide
different levels of resolution. The proper level of resolution is
selected on-the-fly based on the zooming factor. To achieve the
best quality in continuous zooming, the two levels which
bound the current zooming factor can be interpolated, similar to
the use of mip-maps for anti-aliasing in texture mapping [21].
In order to avoid loading the entire high resolution image in
memory while zooming in, the image can be segmented so that
the memory requirement is independent of the zoom factor. As
the zoom factor increases, a smaller percentage of a larger
image is visible. Conversely, a larger percentage of a lower
resolution image needs to be displayed. Therefore, the number
of pixels required of the source image is roughly constant and is
only related to the number of pixels displayed. One way of
segmenting the image is dividing the multiple levels of image
into tiles of the same size. The higher resolution images yield
more tiles and vice versa. In this way, when the zooming factor
changes, only a fixed number of tiles need to be visited.
The different levels of resolution do not need to come from
the same image. The detailed image could be from a different
image to achieve effects like the "infinite zoom" [22], [23].
The image-based approach has been implemented in a
commercial product called QuickTime VR, built on top of Apple
Computer's QuickTime digital multimedia framework. The
current implementation includes continuous camera panning
and zooming, jumping to selected points and object rotation
Currently, QuickTime VR uses cylindrical environment
maps or panoramic images to accomplish camera rotation. The
choice of a cylindrical map over other types is based on a
number of factors. It is easier to capture a cylindrical panorama
than other types of environment maps. One can use
commercially available panoramic cameras which have a
rotating vertical slit. We have also developed a tool which
automatically “stitches” together a set of overlapping
photographs (see 4.3.1.2) to create a seamless panorama. The
cylindrical map only curves in one direction, which makes it
efficient to perform image warping.
QuickTime VR includes an interactive environment which
uses a software-based real-time image processing engine for
navigating in space and an authoring environment for creating
VR movies. The interactive environment is implemented as an
operating system component that can be accessed by any
QuickTime 2.0 compliant application program. The interactive
environment comprises two types of players. The panoramic
movie player allows the user to pan, zoom and navigate in a
scene. It also includes a “hot spot” picking capability. Hot
spots are regions in an image that allow for user interaction.
The object movie player allows the user to rotate an object or
view the object from different viewing directions. The players
run on most Macintoshand Windowsplatforms. The
panoramic authoring environment consists of a suite of tools
to perform panoramic image stitching, hot spot marking,
linking, dicing and compression. The object movies are created
with a motion-controllable camera.
The following sections briefly describe the movie format,
the players and the process of making the movies.