All Databases develop - 1994

June 94 - Exploiting Graphics Speed on the Power Macintosh

June 94 - Exploiting Graphics Speed on the Power

Macintosh

KONSTANTIN OTHMER, SHANNON HOLLAND, AND

BRIAN COX

The new QuickDraw on the PowerPC platform substantially improves graphics

performance. A study comparing the performance of QuickDraw and custom blitters on

the Power Macintosh and 680x0-based machines provides information you can use to

ensure that the user benefits from those improvements. Further analysis, detailing

where CopyBits spends its time, leads to an implementation strategy for applications

that demand the fastest possible graphics.

Understanding the motivation for and consequences of the changes to QuickDraw on the

Power Macintosh can help you write faster applications. This article presents studies

that show QuickDraw as one of the most speed-critical parts of the Macintosh

Operating System together with studies that break down how applications spend CPU

time. Knowing how much time applications actually spend in various system routines

will help you develop a strategy for writing applications that perform well on both the

Power Macintosh and 680x0-based machines.

In porting QuickDraw to the PowerPCTM platform, Apple took advantage of the

opportunity to make some changes. We'll detail these changes and their consequences

for writing code. With that foundation, we'll move on to an in-depth discussion

comparing the QuickDraw CopyBits routine with custom blitters. The goal is to write

applications using routines that result in the fastest possible graphics performance on

both platforms -- PowerPC and 680x0 -- as well as on machines equipped with

graphics accelerators such as the new Apple Macintosh Display Card 24 AC. Sample

code on this issue's CD demonstrates a method of timing blitter routines so that your

application can use the fastest routine at run time.

HOW SPEED-CRITICAL IS QUICKDRAW?

Most of the Macintosh Operating System is written in 680x0 assembly language. In

order to reach time-to-market goals for the Power Macintosh, Apple had to focus

porting efforts on the most speed- critical parts of the system, so a study was

conducted to profile system usage of several common applications. System usage

depended largely on the operations performed in particular applications, but many

applications showed similar patterns.

Figure 1 is based on a subset of the study. It turns out that most applications spend

from 50% to 95% of their time in system code, with many spending more than 80%.

Figure 2 shows the percentage of total CPU time spent in the most frequently called

system routines for typical applications and for a pointer-based application (one that

avoids using handles).

Figure 1. CPU time breakdown: application versus system

Figure 2. System routine usage

The data made it clear that QuickDraw was one of the most critical components of

Apple's porting efforts. This article discusses QuickDraw version 1.3.5, which was

developed to run on the PowerPC platform. The new QuickDraw is based on QuickDraw

version 1.3.0, the most recent version of QuickDraw running on the Macintosh

Quadra, but with some changes (see the section "What's Different With Version

1.3.5?"). The new version, written in C, was compiled for the Power Macintosh as

QuickDraw version 1.3.5 and shipped with the new machines. The new QuickDraw C

code can also be compiled for 680x0-based machines and will be available in future

software releases.

The graphics speed comparisons made in this article compare the following:

• QuickDraw version 1.3.0 or other 680x0 code running on a 680x0-based

Macintosh (usually a Macintosh Quadra)

• QuickDraw version 1.3.0 or other 680x0 code running through the

emulator on a Power Macintosh

• QuickDraw version 1.3.5 or other PowerPC code running on a Power

Macintosh

TAKING ADVANTAGE OF THE SPEED

Figure 3 compares times of various QuickDraw routines for version 1.3.0 running on

a Macintosh Quadra and version 1.3.5 running on a Power Macintosh -- there's no

question that the new QuickDraw routines run faster. However, published surveys

comparing the speed of 680x0-based machines to the Power Macintosh haven't always

shown the dramatic results indicated by Figure 3. This is partly because some

operations offer greater increased speed than others, so depending on which operations

an application uses heavily, overall speed varies. A second important factor is that the

applications surveyed are often emulated applications.

Figure 3. Comparing QuickDraw version 1.3.0 to version 1.3.5

Emulated applications are those written for 680x0-based machines that run through

the emulator on the Power Macintosh (see "Making the Leap to PowerPC,develop Issue

16). These applications don't benefit fully from the PowerPC platform, because an

application that spends 80% of its time in system code on 680x0-based machines,

when emulated on a Power Macintosh, spends substantially more time in the

application. In general, completely emulated application code runs at about half the

speed of a Macintosh Quadra 700. Those same applications, when recompiled as

PowerPC code, usually run four or five times faster than on a Macintosh Quadra; code

that makes extensive use of floating point may be 20 times or more faster. However,

emulated graphics-intensive code, assuming it uses QuickDraw, is substantially faster

on a Power Macintosh than on a 680x0-based Macintosh because of the increased speed

of QuickDraw 1.3.5.

Clearly, to take full advantage of QuickDraw version 1.3.5, you need to write your

applications for the Power Macintosh in PowerPC code. Beyond that general strategy,

developing awesome applications for the PowerPC platform means figuring out how to

harness all that CPU power -- how to take advantage of the speed. For example, the

high speed of QuickDraw version 1.3.5 allows you to do high-quality animations.

Figure 3 shows that you can now do twice as many (or more) CopyBits operations per

second, which means that animations such as zooming, scrolling, and window dragging

(leave this one to Apple) can be done in real time without being chunky or annoying.

Text drawing is also much faster, so interactive word wrapping while positioning

objects in text is easy to do and looks better than it would on a 680x0-based

Macintosh. Overall, it's an open field for developers.

Tips for increasing the speed of PowerPC code are given in this issue's Balance of

Power column. *

Although this article focuses on QuickDraw, of course there are other, nongraphical,

ways of harnessing the power of the PowerPC processor. Floating point-intensive

applications benefit tremendously from the speed of the new processor.

The Graphing Calculator desk accessory that ships with the Power Macintosh

is an excellent example of harnessing CPU power for both the user interface and

computation-bound part of an application. As a floating point-intensive application,

Graphing Calculator benefits from the speed of the PowerPC processor. The user

interface has a number of nice touches, such as live scrolling, live zooming, and

interactive formula and graph manipulation. *

WHAT'S DIFFERENT WITH VERSION 1.3.5?

In the porting of QuickDraw to the PowerPC platform, many algorithms were

rethought and reimplemented. The result is slightly different (and we hope better!)

behavior. This section outlines some changes to keep in mind when you're writing code.

QDERROR

QuickDraw version 1.3.0 didn't do a very good job of setting and clearing QDError. In

version 1.3.5, every call sets QDError (which can cause problems for applications

that assume QDError will be preserved across most simple calls, like SetRect). In

some cases, version 1.3.0 jumps to SysError if there isn't enough memory; version

1.3.5 returns an error in QDError instead. This is usually an improvement, but it can

lead to strange behavior for applications that depend on SysError being invoked. For

example, some applications might put up a dialog asking the user to increase the

application partition size if QuickDraw invokes SysError. Since QuickDraw version

1.3.5 doesn't invoke SysError (returning a QDError instead), the application code

that puts up the dialog isn't triggered, so the user doesn't know to increase the memory

and the application might fail by not drawing anything. In choosing to always set

QDError, Apple chose the lesser of two evils.

MATCHING COLOR TABLES

QuickDraw version 1.3.0 uses the color table of the pixMap for the current GDevice,

not the color table of the destination pixMap, to map colors to the destination pixMap.

QuickDraw version 1.3.5 sets up a surrogate GDevice to make sure that the the

destination pixMap's and the GDevice's color tables always match. This may cause

problems for applications that relied on undefined behavior when the color tables

didn't match or for applications that were getting the right results by luck under

QuickDraw version 1.3.0. Again, Apple chose the lesser of two evils, and added the

surrogate device (known as the skank device). When QuickDraw is forced to set up the

skank device, the application pays a slight performance penalty. Also, if you do

operations such as index-to-color when your color tables don't match, and then later

use that color in a drawing, you won't necessarily draw with the index you expect. The

easiest cure: use GWorlds!

For more information on QDError, GDevices, pixMaps, and color tables, see

Inside Macintosh: Imaging With QuickDraw or Inside Macintosh Volume V. *

TRANSFER MODES

There's no way to pass the transfer space (the bit depth at which transfer occurs)

when doing transfer modes in QuickDraw. (QuickDraw GX remedies thisshortcoming.)

So if you're using an arithmetic mode from 8-bit to 16-bit, there are noguarantees

whether the transfer will occur at 5 bits per component (16-bit), 8 bits per

component (32-bit), or 16 bits per component (as in the 8-bit color table). It turns

out that most arithmetic modes in QuickDraw version 1.3.0 perform the transfer

operation at a resolution of 16 bits per color, while version 1.3.5 does most

operations at a resolution of 8 bits per color. This sometimes causes slight cosmetic

differences.

DITHERING

The dithering algorithm in QuickDraw version 1.3.5 is slightly different. This makes

it a nightmare to programmatically determine whether version 1.3.5 is generating the

same results as version 1.3.0, but visually the results are nearly identical.

STRETCHING AND SHRINKING IMAGES

The way CopyBits stretches and shrinks images for nonintegral ratios has been

improved in QuickDraw version 1.3.5 (integral ratios still produce the same results).

The advantage of this new algorithm is that it's symmetrical: if you stretch an image

and then shrink it back to the original size, the same pixels that were replicated in the

stretch are combined in the shrink.

The disadvantage of the new algorithm is that some applications stretch or shrink

without knowing it (the classic off-by-one error, resulting in a destination rectangle

that's smaller or larger than the source rectangle by one pixel). Such applications

may now drop (or replicate) a different scan line. This can cause slight cosmetic

blemishes in some applications.

UNEXPECTED REGISTER CONTENTS

Because QuickDraw version 1.3.5 runs PowerPC code, all emulated 680x0 registers

are preserved across calls. Thus, applications that expect the contents of volatile

registers (A0, A1, D0, D1, D2) to contain specific values on exit from a QuickDraw

call will break. (Conversely, don't rely on 680x0 registers being preserved, either!)

There's one exception: for compatibility with some existing applications, CopyBits

always sets D0 to 0.

PATCHING

Patching any QuickDraw version 1.3.5 routine with 680x0 code degrades performance

because of mode-switch overhead time. A mode switch occurs when a 680x0 caller is

calling PowerPC code, or vice versa. 680x0 patches on ShieldCursor are particularly

expensive because ShieldCursor is called by nearly every QuickDraw drawing routine.

For more information on the Mixed Mode Manager and mode switching, see

"Making the Leap to PowerPC" in develop Issue 16.*

DISABLED ACCELERATOR CARDS

QuickDraw version 1.3.0 makes calls through many low-level (undocumented)

vectors. Version 1.3.5 doesn't use these trap vectors, which disables most accelerator

cards. Of course, the frame buffer on these cards continues to work.

THE COPYBITS/CUSTOM BLITTER RACE

A favorite developer sport is complaining about how slow CopyBits is and writing

custom blit loops to replace it. A favorite sport among QuickDraw engineers is working

all night trying to speed up some part of CopyBits. This competition is healthy so long

as speed-critical applications call the faster code.

"Blitter" informally refers to any routine that moves memory, usually visual

information to the screen or an off-screen buffer; the operation is called a "blit."

These terms derive from the PDP-10 block transfer instruction, BLT. *

Through the years, Apple engineers have yearned for a way to get a substantial lead in

the race with the speed-hungry special-case developer. The answer lies in the Power

Macintosh: raw 680x0 code runs substantially slower through the emulator, while

QuickDraw version 1.3.5 CopyBits takes advantage of the lightning-fast RISC

processor.

Figure 4 compares various ways of moving the memory used by an 8-bit, 32-by-32

pixMap and an 8- bit, 400-by-400 pixMap to the screen. BlockMove gives a baseline:

the typical amount of time needed to move that much raw memory. The 680x0 blitter

is a custom blitter written for 680x0-based machines and emulated on the Power

Macintosh. The PowerPC blitter is a custom blitter written for the Power Macintosh

(it can't be run on a 680x0 machine).

Figure 4. CopyBits versus custom blitters

As you can see, the custom PowerPC blitters beat QuickDraw's CopyBits for the small

image hands down for both 680x0-based machines and the Power Macintosh. (With the

small image the constant overhead of CopyBits has a big impact on the overall time.)

However, the 680x0 blitter is much slower than CopyBits on a Power Macintosh. This

is due to the overhead of emulation.

The interesting case is the custom PowerPC blitter versus CopyBits for the large

image on the Power Macintosh. Here CopyBits wins. This is due to optimizations that

CopyBits has for large images that the PowerPC blitter doesn't have. In this case,

CopyBits is also faster than BlockMove, because of optimizations in CopyBits for the

PowerPC processor's frame buffer (which has a 64-bit data path). BlockMove is

Referenced by (5):