QuickTime VR 学习3

作者: 时间:2022-09-11 点击数:

4.1 The Movie Format

QuickTime VR currently includes two different types of

movies: panoramic and object.

4.1.1 The Panoramic Movie

Conventional QuickTime movies are one-dimensional

compressed sequences indexed by time. Each QuickTime movie

may have multiple tracks. Each track can store a type of linear

media, such as audio, video, text, etc. Each track type may have

its own player to decode the information in the track. The

tracks, which usually run parallel in time, are played

synchronously with a common time scale. QuickTime allows

new types of tracks and players to be added to extend its

capabilities. Refer to [24] and [25] for a detailed description of

the QuickTime architecture.

Panoramic movies are multi-dimensional event-driven

spatially-oriented movies. A panoramic movie permits a user to

pan, zoom and move in a space interactively. In order to retrofit

panoramic movies into the existing linear movie framework, a

new panoramic track type was added. The panoramic track

stores all the linking and additional information associated

with a panoramic movie. The actual panoramic images are

stored in a regular QuickTime video track to take advantage of

the existing video processing capabilities.

An example of a panoramic movie file is shown in figure 3.

The panoramic track is divided into three nodes. Each node

corresponds to a point in a space. A node contains information

about itself and links to other nodes. The linking of the nodes

form a directed graph, as shown in the figure. In this example,

Node 2 is connected to Node 1 and Node 3, which has a link to

an external event. The external event allows custom actions to

be attached to a node.

Hot spots

Node 1

Node 2

Panoramic

images

Node 3

Panoramic

nodes

Tracks

External event

Node Graph

Node 1

Node 2

Node 3

External event

Figure 3. A panoramic movie layout and its

corresponding node graph.

The nodes are stored in three tracks: one panoramic track

and two video tracks. The panoramic track holds the graph

information and pointers to the other two tracks. The first

video track holds the panoramic images for the nodes. The

second video track holds the hot spot images and is optional.

The hot spots are used to identify regions of the panoramic

image for activating appropriate links. All three tracks have

the same length and the same time scale. The player uses the

starting time value of each node to find the node's

corresponding panoramic and hot spot images in the other two

tracks.

The hot spot track is similar to the hit test track in the

Virtual Museum [3]. The hot spots are used to activate events or

navigation. The hot spots image encodes the hot spot id

numbers as colors. However, unlike the Virtual Museum where a

hot spot needs to exist for every view of the same object, the

hot spot image is stored in panoramic form and is thereby

orientation-independent. The hot spot image goes through the

same image warping process as the panoramic image.

Therefore, the hot spots will stay with the objects they attach

to no matter how the camera pans or zooms.

The panoramic and the hot spot images are typically diced

into smaller frames when stored in the video tracks for more

efficient memory usage (see 4.2.1 for detail). The frames are

usually compressed without inter-frame compression (e.g.,

frame differencing). Unlike linear video, the panoramic movie

does not have ana prioriorder for accessing the frames. The

image and hot spot video tracks are disabled so that a regular

QuickTime movie would not attempt to display them as linear

325

videos. Because the panoramic track is the only one enabled,

the panoramic player is called upon to traverse the contents of

the movie at playback time.

The track layout does not need to be the same as the

physical layout of the data on a storage medium. Typically, the

tracks should be interleaved when written to a slow medium,

such as a CD-ROM, to minimize the seek time.

4.1.2 The Object Movie

An object movie typically contains a two-dimensional

array of frames. Each frame corresponds to a viewing direction.

The movie has more than two dimensions if multiple frames are

stored for each direction. The additional frames allow the object

to have time-varying behavior (see 4.2.2). Currently, each

direction is assumed to have the same number of frames.

The object frames are stored in a regular video track.

Additional information, such as the number of frames per

direction and the numbers of rows and columns, is stored with

the movie header. The frames are organized to minimize the

seek time when rotating the object horizontally. As in the

panoramic movies, there is no inter-frame compression for the

frames since the order of rotation is not known in advance.

However, inter-frame compression may be used for the multiple

frames within each viewing direction.

4.2 The Interactive Environment

The interactive environment currently consists of two types

of players: the panoramic player and the object player.

4.2.1 The Panoramic Player

The panoramic player allows the user to perform continuous

panning in the vertical and the horizontal directions. Because

the panoramic image has less than 180 degrees vertical field-of

view, the player does not permit looking all the way up or

down. Rotating about the viewing direction is not currently

supported. The player performs continuous zooming through

image magnification and reduction as mentioned previously. If

multiple levels of resolution are available, the player may

choose the right level based on the current memory usage, CPU

performance, disk speed and other factors. Multiple level

zooming is not currently implemented in QuickTime VR.

Decompress

Warp

Compressed Tiles

Offscreen Buffer

Viewing

Control

Display

Window

Visible

Tiles

Visible

Region

Compressed

Tiles Cache

Main

Memory

Hard Disk or

CD-ROM

Figure 4. Panoramic display process.

The panoramic player allows the user to control the view

orientation and displays a perspectively correct view by

warping a panoramic image. Figure 4 shows the panoramic

display process. The panoramic images are usually compressed

and stored on a hard disk or a CD-ROM. The compressed image

needs to be decompressed to an offscreen buffer first. The

offscreen buffer is generally smaller than the full panorama

because only a fraction of the panorama is visible at any time.

As mentioned previously, the panoramic image is diced into

tiles. Only the tiles overlapping the current view orientation

are decompressed to the offscreen buffer. The visible region on

the offscreen buffer is then warped to display a correct

perspective view. As long as the region moves inside the

offscreen buffer, no additional decompression is necessary. To

minimize the disk access, the most recent tiles may be cached

in the main memory once they are read. The player also

performs pre-paging to read in adjacent tiles while it is idle to

minimize the delay in interactive panning.

The image warp, which reprojects sections of the

cylindrical image onto a planar view, is computed in real-time

using a software-based two-pass algorithm [26]. An example of

the warp is shown in figure 5, where the region enclosed by the

yellow box in the panoramic image is warped to create a

perspective view below.

The performance of the player varies depending on many

factors such as the platform, the color mode, the panning mode

and the window sizes. The player is currently optimized for

display in 16-bit color mode. Some performance figures for

different processors are given below. These figures indicate the

number of updates per second in a 640x400-pixel window in

16-bit color mode. Because the warping is performed with a

two-pass algorithm, panning in 1D is faster than full 2D

panning. Note that the Windows version has a different

implementation for writing to display which may affect the

performance.

Processor 1D Panning 2D Panning

PowerPC601/80

29.5 11.6

MC68040/40

12.3 5.4

Pentium/90

11.4 7.5

486/66 5.9 3.6

The player can perform image warping at different levels of

quality. The lower quality settings perform less filtering and the

images are more jagged but are faster. To achieve the best

balance between quality and performance, the player

automatically adjusts the quality level to maintain a constant

update rate. When the user is panning, the player switches to

lower quality to keep up with the user. When the user stops, the

player updates the image in higher quality.

Moving in space is currently accomplished by jumping to

points where panoramic images are attached. In order to

preserve continuity of motion, the view direction needs to be

maintained when jumping to an adjacent location. The

panoramas are linked together by matching their orientation

manually in the authoring stage (see 4.3.1.4). Figure 6 shows a

sequence of images generated from panoramas spaced 5 feet

apart.

The default user interface for navigation uses a combination

of a 2D mouse and a keyboard. When the cursor moves over a

window, its shape changes to reflect the permissible action at

the current cursor location. The permissible actions include:

continuous panning in 2D; continuous zooming in and out

(controlled by a keyboard); moving to a different node; and

activating a hot spot. Clicking on the mouse initiates the

corresponding actions. Holding down and dragging the mouse

performs continuous panning. The panning speed is controlled

by the distance relative to the mouse click position.

In addition to interactive control, navigation can be placed

under the control of a script. A HyperCardexternal command

and a Windows DLL have been written to drive the player. Any

336

application compatible with the external command or DLL can

control the playback with a script. A C run-time library

interface will be available for direct control from a program.

4.2.2 The Object Player

While the panoramic player is designed to look around a

space from the inside, the object player is used to view an

object from the outside. The object player is based on the

navigable movie approach. It uses a two-dimensional array of

frames to accommodate object rotation. The object frames are

created with a constant color background to facilitate

compositing onto other backgrounds. The object player allows

the user to grab the object using a mouse and rotate it with a

virtual sphere-like interface [27]. The object can be rotated in

two directions corresponding to orbiting the camera in the

longitude and the latitude directions.

If there is more than one frame stored for each direction, the

multiple frames are looped continuously while the object is

being rotated. The looping enables the object to have cyclic

time varying behavior (e.g. a flickering candle or streaming

waterfall).

4.3 The Authoring Environment

The authoring environment includes tools to make

panoramic movies and object movies.

QuickTime VR Movies

Still

Camera

Renderer

Panoramic

Camera

Node

Selection

Stitch

Mark Hot Sopts

Link

Dice &

Compress

Figure 7. The panoramic movie authoring process.

4.3.1 Panoramic Movie Making

A panoramic movie is created in five steps. First, nodes are

selected in a space to generate panoramas. Second, the

panoramas are created with computer rendering, panoramic

photography or “stitching” a mosaic of overlapping

photographs. Third, if there are any hot spots on the panorama,

a hot spot image is constructed by marking regions of the

panorama with pseudo colors corresponding to the hot spot

identifiers. Alternatively, the hot spots can be generated with

computer rendering [28], [3]. Fourth, if more than one

panoramic node is needed, the panoramas are linked together by

manually registering their viewing directions. Finally, the

panoramic images and the hot spot images are diced and

compressed to create a panoramic movie. The authoring process

is illustrated in figure 7.

4.3.1.1 Node Selection

The nodes should be selected to maintain visual consistency

when moving from one to another. The distance between two

adjacent nodes is related to the size of the virtual environment

and the distance to the nearby objects. Empirically we have

found that a 5-10 foot spacing to be adequate with most interior

spaces. The spacing can be significantly increased with

exterior scenes.

4.3.1.2 Stitching

The purpose of stitching is to create a seamless panoramic

image from a set of overlapping pictures. The pictures are taken

with a camera as it rotates about its vertical axis in one

direction only. The camera pans at roughly equal, but not exact,

increments. The camera is mounted on a tripod and centered at

its nodal point with minimal tilting and rolling. The camera is

usually mounted sideways to obtain the maximum vertical field

of-view. The setup of the camera is illustrated in figure 8. The

scene is assumed to be static although some distant object

motion may be acceptable.

Nodal point

Leveling

Rotation

Camera

mounted

sideways

Figure 8. Camera setup for taking overlapping pictures.

The stitcher uses a correlation-based image registration

algorithm to match and blend adjacent pictures. The adjacent

pictures need to have some overlap for the stitcher to work

properly. The amount of overlap may vary depending on the

image features in the overlapping regions. In practice, a 50%

overlap seems to work best because the adjacent pictures may

have very different brightness levels. Having a large overlap

allows the stitcher to more easily smooth out the intensity

variation.

The success rate of the automatic stitching depends on the

input pictures. For a typical stitching session, about 8 out of

10 panoramas can be stitched automatically, assuming each

panorama is made from 12 pictures. The remaining 2 panoramas

requires some manual intervention. The factors which

contribute to automatic stitching failure include, but are not

limited to, missing pictures, extreme intensity change,

insufficient image features, improper camera mounting,

significant object motion and film scanning errors.

In addition to being able to use a regular 35 mm camera, the

ability to use multiple pictures, and hence different exposure

settings, to compose a panorama has another advantage. It

enables one to capture a scene with a very wide intensity range,

such as during a sunset. A normal panoramic camera captures

the entire 360 degrees with a constant exposure setting. Since

film usually has a narrower dynamic range than the real world

does, the resultant panorama may have areas under or over

exposed. The stitcher allows the exposure setting to be

specifically tailored for each direction. Therefore, it may create

a more balanced panorama in extreme lighting conditions.

Although one can use other devices, such as video or digital

cameras for capturing, using still film results in high resolution

347

images even when displayed at full screen on a monitor. The

film can be digitized and stored on Kodak's PhotoCD. Each

PhotoCD contains around 100 pictures with 5 resolutions each.

A typical panorama is stitched with the middle resolution

pictures (i.e., 768 x 512 pixels) and the resulting panorama is

around 2500 x 768 pixels for pictures taken with a 15 mm lens.

This resolution is enough for a full screen display with a

moderate zoom angle. The stitcher takes around 5 minutes to

automatically stitch a 12-picture panorama on a PowerPC

601/80 MHz processor, including reading the pictures from the

PhotoCD and some post processing. An example of a

panoramic image stitched automatically is shown in figure 9.

4.3.1.3 Hot Spot Marking

Hot spots identify regions of a panoramic image for

interactions, such as navigation or activating actions.

Currently, the hot spots are stored in 8-bit images, which limit

the number of unique hot spots to 256 per image. One way of

creating a hot spot image is by painting pseudo colors over the

top of a panoramic image. Computer renderers may generate the

hot spot image directly.

The hot spot image does not need to have the same

resolution as the panoramic image. The resolution of the hot

spot image is related to the precision of picking. A very low

resolution hot spot image may be used if high accuracy of

picking is not required.

4.3.1.4 Linking

The linking process connects and registers view orientation

between adjacent panoramic nodes. The links are directional

and each node may have any number of links. Each link may be

attached to a hot spot so that the user may activate the link by

clicking on the hot spot.

Currently, the linking is performed by manually registering

the source and destination view orientations using a graphical

linker. The main goal of the registration is to maintain visual

consistency when moving from one node to another.

4.3.1.5 Dicing and Compression

The panoramic and hot spot images are diced before being

compressed and stored in a movie. The tile size should be

optimized for both data loading and offscreen buffer size. A

large number of tiles increases the overhead associated with

loading and decompressing the tiles. A small number of tiles

requires a large offscreen buffer and reduces title paging

efficiency. We have found that dicing a panoramic image of

2500x768 pixels into 24 vertical stripes provides an optimal

balance between data loading and tile paging. Dicing the

panorama into vertical stripes also minimizes the seek time

involved when loading the tiles from a CD-ROM during

panning.

A panorama of the above resolution can be compressed to

around 500 KB with a modest 10 to 1 compression ratio using

the Cinepak compressor, which is based on vector quantization

and provides a good quality vs. speed balance. Other

compressors may be used as well for different quality and speed

tradeoffs. The small disk footprint for each panorama means

that a CD-ROM with over 600 MB capacity can hold more than

1,000 panoramas. The capacity will only increase as higher

density CD-ROMs and better compression methods become

available.

The hot spot image is compressed with a lossless 8-bit

compressor. The lossless compression is necessary to ensure

the correctness of the hot spot id numbers. Since the hot spots

usually occupy large contiguous regions, the compressed size is

typically only a few kilo-bytes per image.

4.3.2 Object Movie Making

Making an object movie requires photographing the object

from different viewing directions. To provide a smooth object

rotation, the camera needs to point at the object's center while

orbiting around it at constant increments. While this

requirement can be easily met in computer generated objects,

photographing a physical object in this way is very

challenging unless a special device is built.

Currently, we use a device, called the "object maker," to

accomplish this task. The object maker uses a computer to

control two stepper motors. The computer-controlled motors

orbit a video camera in two directions by fixing its view

direction at the center of the object. The video camera is

connected to a frame digitizer inside the computer, which

synchronizes frame grabbing with camera rotation. The object

is supported by a nearly invisible base and surrounded by a

black curtain to provide a uniform background. The camera can

rotate close to 180 degrees vertically and 360 degrees

horizontally. The camera typically moves at 10-degree

increments in each direction. The entire process may run

automatically and takes around 1 hour to capture an object

completely.

If multiple frames are needed for each direction, the object

may be captured in several passes, with each pass capturing a

full rotation of the object in a fixed state. The multi-pass

capture requires that the camera rotation be repeatable and the

object motion be controllable. In the case of candle light

flickering, the multiple frames may need to be captured

successively before the camera moves on to the next direction.

中国人民警察大学版权所有