All Databases develop - 1992

May 92 - DRAWING IN GWORLDS FOR SPEED AND VERSATILITY

DRAWING IN GWORLDS FOR SPEED AND VERSATILITY

KONSTANTIN OTHMER AND MIKE REED

32-Bit QuickDraw brought system support for off-screen drawing worlds to the

Macintosh, and Color QuickDraw continues this support in System 7. Using custom

drawing routines in off-screen worlds can increase a program's speed and image-

processing versatility. This article describes custom drawing routines that do just

that.

It's a basic rule of Macintosh programming never to write a drawing routine that

draws directly to the screen. There are two good reasons for this rule. First, multiple

clients share the screen, and custom routines that draw directly to the screen may

violate cooperation rules (new ones are being invented all the time). Second, support

for new types of displays may be added to QuickDraw (as was the case with 32-Bit

QuickDraw), and custom routines that draw directly to the screen certainly won't

work when new display types are introduced.

So if your application has a drawing need that QuickDraw cannot fulfill, off-screen

drawing is the only way to go. Your application draws to an off-screen copy of the

application window, and the off- screen image is transferred to the screen with

QuickDraw's CopyBits procedure. In the off-screen environment your application is

the sole proprietor, and support for new displays will not affect how the off-screen

environment behaves. In addition, using CopyBits to transfer an off-screen image onto

the screen enables fast and smooth updating.

There are a couple of different ways to create an off-screen drawing environment. The

old-fashioned way is to create it by hand, an arduous task that results in all the

structures being kept in main memory. The new, improved way is to create it with the

NewGWorld call first made available by 32- Bit QuickDraw and now supported by

Color QuickDraw in System 7. When this method is used, a copy of the GWorld can be

cached on an accelerator card, thus enabling improved performance by minimizing

NuBusTM traffic during drawing operations. (For a full comparison of drawing

operations with and without the use of GWorlds, see "Macintosh Display Card 8*24GC

indevelop Issue 3.)

Given that you must certainly see the wisdom of using GWorlds in applications, we'll

now move on to the good stuff--how to increase performance and create some

interesting special effects with custom drawing routines. You should know the basics of

creating and disposing of GWorlds to get the most from this article. If you need a

review of these basics, read "Braving Offscreen Worlds" indevelop Issue 1 or see

Chapter 21 ofInside Macintosh Volume VI.

CUSTOM DRAWING ROUTINES TO INCREASE SPEED

Sometimes QuickDraw works too slowly for some of us. Whereas QuickDraw often

trades performance for flexibility, there are times we'd just as soon trade flexibility

for performance. In those cases, we can achieve tremendous gains in speed by writing

custom routines to draw to off-screen worlds. Before writing such a routine, though,

we need to understand what slows QuickDraw down.

WHY IS QUICKDRAW OFTEN SO SLOW?

Let's examine EraseRect to help us understand the considerable overhead QuickDraw

has to deal with just to perform a simple operation. An EraseRect call is issued via a

trap, so right off the bat we incur the overhead of the trap dispatcher. For a complex

operation, this overhead is relatively small, but for a simple operation performed

repetitively, this overhead can be significant. In the latter case, the trap dispatcher

overhead can be avoided by calling GetTrapAddress and then calling the routine

directly. (Note that with high-level routines, some traps take a selector.)

After we've called the routine, QuickDraw must do the following setup:

1. Check for a bottleneck procedure in the current port.

2. Check whether picture recording is enabled.

3. Calculate the intersection of the clipRgn and the visRgn and see if the

drawing will be clipped out.

4. Check whether drawing is to the screen, and if so shield the cursor if the

drawing intersects the cursor location.

5. Walk the device list and draw to each monitor that the clipped rectangle

intersects.

Then the drawing takes place, consisting of these steps:

1. If the pixel map requires 32-bit addressing, enter 32-bit mode.

2. Determine the transfer mode to draw with.

3. Convert the pattern to the correct depth and alignment for this drawing.

4. Determine how to color the pixel pattern using the colors from the port.

5. Blast the bits.

The teardown consists of two steps:

1. Exit 32-bit addressing mode, if appropriate.

2. Unshield the cursor.

Notice that this list doesn't include error checking. QuickDraw does do some error

checking, but rigorous checking slows performance further. While many of the items

on this list are a simple check, others require considerable processor time. There's

plenty of room here for reducing overhead by writing custom routines.

Custom routines can often skip all but step 10. For drawing operations that spend the

majority of time in step 10, custom routines can't offer big wins in performance. But

for operations that spend most of the time elsewhere, custom routines can achieve

significant performance gains.

For example, if you copy a large image with CopyBits and the source and destination

pixel depths are the same, the fgColor is black and the bkColor is white, the color

tables match, the clipping regions are rectangular, and the alignment is the same, the

operation is already very efficient since the majority of time is spent moving the bits

rather than doing overhead. In this situation, you can't hope to gain substantial speed

with a custom drawing routine. In contrast, for an operation such as setting a single

pixel, the overhead involved in setting up the drawing operation eclipses the time

actually spent drawing, so this is a candidate for a custom drawing routine.

OPTIMIZING A CUSTOM ROUTINE TO SET A SINGLE PIXEL

The simplest drawing to an off-screen world is setting a single pixel. Let's compare

how QuickDraw sets a single pixel with how a custom drawing routine might do it. For

our custom routine, we'll assume the off-screen world is 32 bits deep. This

assumption gives us significant gains in speed and reduces code size and complexity.

Our sample code inverts the red and green channels. Figure 1 illustrates the

transformation this accomplishes. Using QuickDraw, the code looks like this:

for (y = bounds.top; y < bounds.bottom; y++)

{

for (x = bounds.left; x < bounds.right; x++)

{

GetCPixel(x, y, &myRGB);

myRGB.red ^= 0xFFFF; /* Invert the red and green channels. */

myRGB.green ^= 0xFFFF;

SetCPixel(x, y, &myRGB);

}

Figure 1 A Couple of Crazy Guys, Before and After Red/Green Inversion

As shown here, we use the QuickDraw routines GetCPixel and SetCPixel to get and set

the color of a single pixel. SetCPixel is converted to a line-drawing command, because

setting a single pixel is actually a special case of drawing a line (avery short line!).

This way of implementing pixel setting is advantageous because line-drawing

operations are saved in pictures and use the pattern and transfer mode from the port.

It also simplifies QuickDraw on the bottleneck level since no separate bottleneck

routine exists for setting pixels. The downside is that setting a single pixel this way is

slow. To produce the transformation shown in Figure 1, the code takes 624 ticks or

about 10.4 seconds to run on a Macintosh IIfx.

Faster. Now let's develop a custom routine that optimizes setting a single pixel. For

a first pass, we'll eliminate the majority of the overhead and set the pixel directly

rather than do line drawing. Given a GWorldPtr, an x and y position, and a 32-bit

value, our routine GWSet32PixelC sets the pixel at that position to that value. The

parallel call GWGet32PixelC is identical, except that where GWSet32PixelC sets the

value, GWGet32PixelC returns it.

GWSet32PixelC(GWorldPtr src, short x, short y, long pixelValue)

{

PixMapHandle srcPixMap;

unsigned short srcRowBytes;

long srcBaseAddr;

long srcAddr;

char mmuMode;

srcPixMap = GetGWorldPixMap(src);

/* Get the address of the pixels. */

srcBaseAddr = (long) GetPixBaseAddr(srcPixMap);

/* Get the row increment. */

srcRowBytes = (**srcPixMap).rowBytes & 0x7fff;

/* Make coordinates pixel map relative. */

x -= (**srcPixMap).bounds.left;

y -= (**srcPixMap).bounds.top;

mmuMode = true32b;

SwapMMUMode(&mmuMode); /* Set the MMU to 32-bit mode. */

/* Calculate the address of the pixel: base + y*(row size in

bytes) + x*(bytes/pixel). */

srcAddr = srcBaseAddr + (long)y*srcRowBytes + (x << 2);

*((long *)srcAddr) = pixelValue;

SwapMMUMode(&mmuMode); /* Restore the previous MMU mode. */

}

Of interest in this code is the call to SwapMMUMode before drawing to the GWorld. This

is necessary since the GWorld could be cached on an accelerator card and require

32-bit addressing to access it. (See "QuickDraw's CopyBits Procedure" indevelop

Issue 6 for a complete explanation.)

If we revise our sample code to use our new calls GWGet32PixelC and GWSet32PixelC,

the image shown in Figure 1 takes 398 ticks (or 6.8 seconds) to process. This is about

65% faster than QuickDraw, but is still much slower than it needs to be.

And faster. There are two major inefficiencies in our sample code: it makes four

trap calls and it's at the mercy of the C compiler. Both of these problems are easily

overcome, as the FastGWSet32Pixel routine demonstrates. Rather than take a

GWorldPtr, FastGWSet32Pixel takes a pixMap pointer and a base address.

Furthermore, the routine assumes it's being called in 32-bit mode. Note that the

variablesbounds, top, rowBytes, and left are defined in QuickEquate.a.

FastGWSet32Pixel(PixMap *srcPixMap, long *srcBaseAddr, short x,

short y, long pixelValue)

{

FastGWSet32Pixel(PixMap *srcPixMap, long *srcBaseAddr, short x,

short y, long pixelValue)

{

asm {

move.l srcPixMap,a0 ;Must be 32-bit-clean pointer

move.w y,d1 ;Get y

sub.w bounds+top(a0),d1 ;Make y bounds 0 relative

move.w rowBytes(a0),d0 ;Get rowBytes

and.w #0x7FFF,d0 ;Strip bitmap/pixMap bit

mulu.w d0,d1 ;Calculate offset to start of this row

move.l srcBaseAddr,a1 ;Must be 32-bit base address

adda.l d1,a1 ;Calculate address of this row

moveq #0,d0 ;Extend x to a word

move.w x,d0

sub.w bounds+left(a0),d0 ;Make x bounds 0 relative

lsl.w #2,d0 ;Convert x to pixels (4 bytes/pixel)

adda.l d0,a1 ;Calculate pixel address

move.l pixelValue,(a1)

}

You may wonder why this routine takes both a pixMap and a base address. Can't it just

get the base address from the pixMap directly? The answer is no, since the base

address of a GWorld can be a handle rather than a pointer and in the future might be

something different again. You must pass in abase address that's good in 32-bit

addressing mode. The GetPixBaseAddr routine called by GWSet32PixelC returns the

correct base address given a pixMap.

Revising our sample code isn't as trivial as it was before because of the additional

assumptions made by these fast get and set pixel routines. Here's the new version of

the code:

/* Get pixMap's 32-bit base address. */

srcBaseAddr = (long *) GetPixBaseAddr(myPixMapHandle);

myPixMapPtr = *myPixMapHandle;

/* WARNING: The pixMapHandle is dereferenced throughout these next

loops. The code makes sure memory will not move. In particular,

it's important that the segment containing the FastGWGet32Pixel

Referenced by (3):