QuickTime VR 学习1

作者: 时间:2022-08-27 点击数:

ABSTRACT

Traditionally, virtual reality systems use 3D computer

graphics to model and render virtual environments in real-time.

This approach usually requires laborious modeling and

expensive special purpose rendering hardware. The rendering

quality and scene complexity are often limited because of the

real-time constraint. This paper presents a new approach which

uses 360-degree cylindrical panoramic images to compose a

virtual environment. The panoramic image is digitally warped

on-the-fly to simulate camera panning and zooming. The

panoramic images can be created with computer rendering,

specialized panoramic cameras or by "stitching" together

overlapping photographs taken with a regular camera. Walking

in a space is currently accomplished by "hopping" to different

panoramic points. The image-based approach has been used in

the commercial product QuickTime VR, a virtual reality

extension to Apple Computer's QuickTime digital multimedia

framework. The paper describes the architecture, the file format,

the authoring process and the interactive players of the VR

system. In addition to panoramic viewing, the system includes

viewing of an object from different directions and hit-testing

through orientation-independent hot spots.

CR Categories and Subject Descriptors:I.3.3

[Computer Graphics]: Picture/Image Generation– Viewing

algorithms; I.4.3 [ Image Processing]: Enhancement–

Geometric correction, Registration.

Additional Keywords: image warping, image registration,

virtual reality, real-time display, view interpolation,

environment maps, panoramic images.

1 INTRODUCTION

A key component in most virtual reality systems is the

ability to perform a walkthrough of a virtual environment from

different viewing positions and orientations. The walkthrough

requires the synthesis of the virtual environment and the

simulation of a virtual camera moving in the environment with

up to six degrees of freedom. The synthesis and navigation are

usually accomplished with one of the following two methods.

1.1 3D Modeling and Rendering

Traditionally, a virtual environment is synthesized as a

collection of 3D geometrical entities. The geometrical entities

are rendered in real-time, often with the help of special purpose

3D rendering engines, to provide an interactive walkthrough

experience.

The 3D modeling and rendering approach has three main

problems. First, creating the geometrical entities is a laborious

manual process. Second, because the walkthrough needs to be

performed in real-time, the rendering engine usually places a

limit on scene complexity and rendering quality. Third, the

need for a special purpose rendering engine has limited the

availability of virtual reality for most people since the

necessary hardware is not widely available.

Despite the rapid advance of computer graphics software and

hardware in the past, most virtual reality systems still face the

above problems. The 3D modeling process will continue to be a

very human-intensive operation in the near future. The real

time rendering problem will remain since there is really no

upper bound on rendering quality or scene complexity. Special

purpose 3D rendering accelerators are still not ubiquitous and

are by no means standard equipment among personal computer

users.

1.2 Branching Movies

Another approach to synthesize and navigate in virtual

environments, which has been used extensively in the video

game industry, is branching movies. Multiple movie segments

depicting spatial navigation paths are connected together at

selected branch points. The user is allowed to move on to a

different path only at these branching points. This approach

usually uses photography or computer rendering to create the

movies. A computer-driven analog or digital video player is

used for interactive playback. An early example of this

approach is the movie-map [1], in which the streets of the city

of Aspen were filmed at 10-foot intervals. At playback time,

two videodisc players were used to retrieve corresponding views

to simulate the effects of walking on the streets. The use of

digital videos for exploration was introduced with the Digital

Video Interactive technology [2]. The DVI demonstration

allowed a user to wander around the Mayan ruins of Palenque

using digital video playback from an optical disk. A "Virtual

Museum" based on computer rendered images and CD-ROM was

described in [3]. In this example, at selected points in the

museum, a 360-degree panning movie was rendered to let the

user look around. Walking from one of the points to another

was simulated with a bi-directional transition movie, which

contained a frame for each step in both directions along the

path connecting the two points.

An obvious problem with the branching movie approach is

its limited navigability and interaction. It also requires a large

amount of storage space for all the possible movies. However,

this method solves the problems mentioned in the 3D

approach. The movie approach does not require 3D modeling

and rendering for existing scenes; it can use photographs or

movies instead. Even for computer synthesized scenes, the

movie-based approach decouples rendering from interactive

playback. The movie-based approach allows rendering to be

performed at the highest quality with the greatest complexity

without affecting the playback performance. It can also use

inexpensive and common video devices for playback.

Permission to make digital/hard copy of part or all of this work

for personal or classroom use is granted without fee provided

that copies are not made or distributed for profit or commercial

advantage, the copyright notice, the title of the publication and

its date appear, and notice is given that copying is by permission

of ACM, Inc. To copy otherwise, to republish, to post on

servers, or to redistribute to lists, requires prior specific

permission and/or a fee.

©1995 ACM-0-89791-701-4/95/008…$3.50

292

1.3 Objectives

Because of the inadequacy of the existing methods, we

decided to explore a new approach for the creation and

navigation of virtual environments. Specifically, we wanted to

develop a new system which met the following objectives:

First, the system should playback at interactive speed on

most personal computers available today without hardware

acceleration. We did not want the system to rely on special

input or output devices, such as data gloves or head-mount

displays, although we did not preclude their use.

Second, the system should accommodate both real and

synthetic scenes. Real-world scenes contain enormously rich

details often difficult to model and render with a computer. We

wanted the system to be able to use real-world scenery directly

without going through computer modeling and rendering.

Third, the system should be able to display high quality

images independent of scene complexity. Many virtual reality

systems often compromise by displaying low quality images

and/or simplified environments in order to meet the real-time

display constraint. We wanted our system's display speed to be

independent of the rendering quality and scene complexity.

1.4 Overview

This paper presents an image-based system for virtual

environment navigation based on the above objectives. The

system uses real-time image processing to generate 3D

perspective viewing effects. The approach presented is similar

to the movie-based approach and shares the same advantages. It

differs in that the movies are replaced with “orientation

independent” images and the movie player is replaced with a

real-time image processor. The images that we currently use are

cylindrical panoramas. The panoramas are orientation

independent because each of the images contains all the

information needed to look around in 360 degrees. A number of

these images can be connected to form a walkthrough sequence.

The use of orientation-independent images allows a greater

degree of freedom in interactive viewing and navigation. These

images are also more concise and easier to create than movies.

We discuss work related to our approach in Section 2.

Section 3 presents the simulation of camera motions with the

image-based approach. In Section 4, we describe QuickTime

VR, the first commercial product using the image-based

method. Section 5 briefly outlines some applications of the

image-based approach and is followed by conclusions and future

directions.

2. RELATED WORK

The movie-based approach requires every displayable view

to be created and stored in the authoring stage. In the movie

map [1] [4], four cameras are used to shoot the views at every

point, thereby, giving the user the ability to pan to the left and

right at every point. The Virtual Museum stores 45 views for

each 360-degree pan movie [3]. This results in smooth panning

motion but at the cost of more storage space and frame creation

time.

The navigable movie [5] is another example of the movie

based approach. Unlike the movie-map or the Virtual Museum,

which only have the panning motion in one direction, the

navigable movie offers two-dimensional rotation. An object is

photographed with a camera pointing at the object's center and

orbiting in both the longitude and the latitude directions at

roughly 10-degree increments. This process results in hundreds

of frames corresponding to all the available viewing directions.

The frames are stored in a two-dimensional array which are

indexed by two rotational parameters in interactive playback.

When displaying the object against a static background, the

effect is the same as rotating the object. Panning to look at a

scene is accomplished in the same way. The frames in this case

represent views of the scene in different view orientations.

If only the view direction is changing and the viewpoint is

stationary, as in the case of pivoting a camera about its nodal

point (i.e. the optical center of projection), all the frames from

the pan motion can be mapped to a canonical projection. This

projection is termed an environment map, which can be

regarded as an orientation-independent view of the scene. Once

an environment map is generated, any arbitrary view of the

scene, as long as the viewpoint does not move, can be

computed by a reprojection of the environment map to the new

view plane.

The environment map was initially used in computer

graphics to simplify the computations of specular reflections

on a shiny object from a distant scene [6], [7], [8]. The scene is

first projected onto an environment map centered at the object.

The map is indexed by the specular reflection directions to

compute the reflection on the object. Since the scene is far

away, the location difference between the object center and the

surface reflection point can be ignored.

Various types of environment maps have been used for

interactive visualization of virtual environments. In the movie

map, anamorphic images were optically or electronically

processed to obtain 360-degree viewing [1], [9]. A project

called "Navigation" used a grid of panoramas for sailing

simulation [10]. Real-time reprojection of environment maps

was used to visualize surrounding scenes and to create

interactive walkthrough [11], [12]. A hardware method for

environment map look-up was implemented for a virtual reality

system [13].

While rendering an environment map is trivial with a

computer, creating it from photographic images requires extra

work. Greene and Heckbert described a technique of

compositing multiple image streams with known camera

positions into a fish-eye view [14]. Automatic registration can

be used to composite multiple source images into an image with

enhanced field of view [15], [16], [17].

When the viewpoint starts moving and some objects are

nearby, as in the case of orbiting a camera around an object, the

frames can no longer be mapped to a canonical projection. The

movement of the viewpoint causes "disparity" between

different views of the same object. The disparity is a result of

depth change in the image space when the viewpoint moves

(pivoting a camera about its nodal point does not cause depth

change). Because of the disparity, a single environment map is

insufficient to accommodate all the views. The movie-based

approach simply stores all the frames. The view interpolation

method presented by Chen and Williams [18] stores only a few

key frames and synthesizes the missing frames on-the-fly by

interpolation. However, this method requires additional

information, such as a depth buffer and camera parameters, for

each of the key frames. Automatic or semi-automatic methods

have been developed for registering and interpolating images

with unknown depth and camera information [16], [19],[20].

中国人民警察大学版权所有