May 92 - DRAWING IN GWORLDS FOR SPEED AND VERSATILITY
DRAWING IN GWORLDS FOR SPEED AND VERSATILITY
KONSTANTIN OTHMER AND MIKE REED
32-Bit QuickDraw brought system support for off-screen drawing worlds to the
Macintosh, and Color QuickDraw continues this support in System 7. Using custom
drawing routines in off-screen worlds can increase a program's speed and image-
processing versatility. This article describes custom drawing routines that do just
that.
It's a basic rule of Macintosh programming never to write a drawing routine that
draws directly to the screen. There are two good reasons for this rule. First, multiple
clients share the screen, and custom routines that draw directly to the screen may
violate cooperation rules (new ones are being invented all the time). Second, support
for new types of displays may be added to QuickDraw (as was the case with 32-Bit
QuickDraw), and custom routines that draw directly to the screen certainly won't
work when new display types are introduced.
So if your application has a drawing need that QuickDraw cannot fulfill, off-screen
drawing is the only way to go. Your application draws to an off-screen copy of the
application window, and the off- screen image is transferred to the screen with
QuickDraw's CopyBits procedure. In the off-screen environment your application is
the sole proprietor, and support for new displays will not affect how the off-screen
environment behaves. In addition, using CopyBits to transfer an off-screen image onto
the screen enables fast and smooth updating.
There are a couple of different ways to create an off-screen drawing environment. The
old-fashioned way is to create it by hand, an arduous task that results in all the
structures being kept in main memory. The new, improved way is to create it with the
NewGWorld call first made available by 32- Bit QuickDraw and now supported by
Color QuickDraw in System 7. When this method is used, a copy of the GWorld can be
cached on an accelerator card, thus enabling improved performance by minimizing
NuBusTM traffic during drawing operations. (For a full comparison of drawing
operations with and without the use of GWorlds, see "Macintosh Display Card 8*24GC
indevelop Issue 3.)
Given that you must certainly see the wisdom of using GWorlds in applications, we'll
now move on to the good stuff--how to increase performance and create some
interesting special effects with custom drawing routines. You should know the basics of
creating and disposing of GWorlds to get the most from this article. If you need a
review of these basics, read "Braving Offscreen Worlds" indevelop Issue 1 or see
Chapter 21 ofInside Macintosh Volume VI.
CUSTOM DRAWING ROUTINES TO INCREASE SPEED
Sometimes QuickDraw works too slowly for some of us. Whereas QuickDraw often
trades performance for flexibility, there are times we'd just as soon trade flexibility
for performance. In those cases, we can achieve tremendous gains in speed by writing
custom routines to draw to off-screen worlds. Before writing such a routine, though,
we need to understand what slows QuickDraw down.
WHY IS QUICKDRAW OFTEN SO SLOW?
Let's examine EraseRect to help us understand the considerable overhead QuickDraw
has to deal with just to perform a simple operation. An EraseRect call is issued via a
trap, so right off the bat we incur the overhead of the trap dispatcher. For a complex
operation, this overhead is relatively small, but for a simple operation performed
repetitively, this overhead can be significant. In the latter case, the trap dispatcher
overhead can be avoided by calling GetTrapAddress and then calling the routine
directly. (Note that with high-level routines, some traps take a selector.)
After we've called the routine, QuickDraw must do the following setup:
1. Check for a bottleneck procedure in the current port.
2. Check whether picture recording is enabled.
3. Calculate the intersection of the clipRgn and the visRgn and see if the
drawing will be clipped out.
4. Check whether drawing is to the screen, and if so shield the cursor if the
drawing intersects the cursor location.
5. Walk the device list and draw to each monitor that the clipped rectangle
intersects.
Then the drawing takes place, consisting of these steps:
1. If the pixel map requires 32-bit addressing, enter 32-bit mode.
2. Determine the transfer mode to draw with.
3. Convert the pattern to the correct depth and alignment for this drawing.
4. Determine how to color the pixel pattern using the colors from the port.
5. Blast the bits.
The teardown consists of two steps:
1. Exit 32-bit addressing mode, if appropriate.
2. Unshield the cursor.
Notice that this list doesn't include error checking. QuickDraw does do some error
checking, but rigorous checking slows performance further. While many of the items
on this list are a simple check, others require considerable processor time. There's
plenty of room here for reducing overhead by writing custom routines.
Custom routines can often skip all but step 10. For drawing operations that spend the
majority of time in step 10, custom routines can't offer big wins in performance. But
for operations that spend most of the time elsewhere, custom routines can achieve
significant performance gains.
For example, if you copy a large image with CopyBits and the source and destination
pixel depths are the same, the fgColor is black and the bkColor is white, the color
tables match, the clipping regions are rectangular, and the alignment is the same, the
operation is already very efficient since the majority of time is spent moving the bits
rather than doing overhead. In this situation, you can't hope to gain substantial speed
with a custom drawing routine. In contrast, for an operation such as setting a single
pixel, the overhead involved in setting up the drawing operation eclipses the time
actually spent drawing, so this is a candidate for a custom drawing routine.
OPTIMIZING A CUSTOM ROUTINE TO SET A SINGLE PIXEL
The simplest drawing to an off-screen world is setting a single pixel. Let's compare
how QuickDraw sets a single pixel with how a custom drawing routine might do it. For
our custom routine, we'll assume the off-screen world is 32 bits deep. This
assumption gives us significant gains in speed and reduces code size and complexity.
Our sample code inverts the red and green channels. Figure 1 illustrates the
transformation this accomplishes. Using QuickDraw, the code looks like this:
for (y = bounds.top; y < bounds.bottom; y++)
{
for (x = bounds.left; x < bounds.right; x++)
{
GetCPixel(x, y, &myRGB);
myRGB.red ^= 0xFFFF; /* Invert the red and green channels. */
myRGB.green ^= 0xFFFF;
SetCPixel(x, y, &myRGB);
}
}
Figure 1 A Couple of Crazy Guys, Before and After Red/Green Inversion
As shown here, we use the QuickDraw routines GetCPixel and SetCPixel to get and set
the color of a single pixel. SetCPixel is converted to a line-drawing command, because
setting a single pixel is actually a special case of drawing a line (avery short line!).
This way of implementing pixel setting is advantageous because line-drawing
operations are saved in pictures and use the pattern and transfer mode from the port.
It also simplifies QuickDraw on the bottleneck level since no separate bottleneck
routine exists for setting pixels. The downside is that setting a single pixel this way is
slow. To produce the transformation shown in Figure 1, the code takes 624 ticks or
about 10.4 seconds to run on a Macintosh IIfx.
Faster. Now let's develop a custom routine that optimizes setting a single pixel. For
a first pass, we'll eliminate the majority of the overhead and set the pixel directly
rather than do line drawing. Given a GWorldPtr, an x and y position, and a 32-bit
value, our routine GWSet32PixelC sets the pixel at that position to that value. The
parallel call GWGet32PixelC is identical, except that where GWSet32PixelC sets the
value, GWGet32PixelC returns it.
GWSet32PixelC(GWorldPtr src, short x, short y, long pixelValue)
{
PixMapHandle srcPixMap;
unsigned short srcRowBytes;
long srcBaseAddr;
long srcAddr;
char mmuMode;
srcPixMap = GetGWorldPixMap(src);
/* Get the address of the pixels. */
srcBaseAddr = (long) GetPixBaseAddr(srcPixMap);
/* Get the row increment. */
srcRowBytes = (**srcPixMap).rowBytes & 0x7fff;
/* Make coordinates pixel map relative. */
x -= (**srcPixMap).bounds.left;
y -= (**srcPixMap).bounds.top;
mmuMode = true32b;
SwapMMUMode(&mmuMode); /* Set the MMU to 32-bit mode. */
/* Calculate the address of the pixel: base + y*(row size in
bytes) + x*(bytes/pixel). */
srcAddr = srcBaseAddr + (long)y*srcRowBytes + (x << 2);
*((long *)srcAddr) = pixelValue;
SwapMMUMode(&mmuMode); /* Restore the previous MMU mode. */
}
Of interest in this code is the call to SwapMMUMode before drawing to the GWorld. This
is necessary since the GWorld could be cached on an accelerator card and require
32-bit addressing to access it. (See "QuickDraw's CopyBits Procedure" indevelop
Issue 6 for a complete explanation.)
If we revise our sample code to use our new calls GWGet32PixelC and GWSet32PixelC,
the image shown in Figure 1 takes 398 ticks (or 6.8 seconds) to process. This is about
65% faster than QuickDraw, but is still much slower than it needs to be.
And faster. There are two major inefficiencies in our sample code: it makes four
trap calls and it's at the mercy of the C compiler. Both of these problems are easily
overcome, as the FastGWSet32Pixel routine demonstrates. Rather than take a
GWorldPtr, FastGWSet32Pixel takes a pixMap pointer and a base address.
Furthermore, the routine assumes it's being called in 32-bit mode. Note that the
variablesbounds, top, rowBytes, and left are defined in QuickEquate.a.
FastGWSet32Pixel(PixMap *srcPixMap, long *srcBaseAddr, short x,
short y, long pixelValue)
{
FastGWSet32Pixel(PixMap *srcPixMap, long *srcBaseAddr, short x,
short y, long pixelValue)
{
asm {
move.l srcPixMap,a0 ;Must be 32-bit-clean pointer
move.w y,d1 ;Get y
sub.w bounds+top(a0),d1 ;Make y bounds 0 relative
move.w rowBytes(a0),d0 ;Get rowBytes
and.w #0x7FFF,d0 ;Strip bitmap/pixMap bit
mulu.w d0,d1 ;Calculate offset to start of this row
move.l srcBaseAddr,a1 ;Must be 32-bit base address
adda.l d1,a1 ;Calculate address of this row
moveq #0,d0 ;Extend x to a word
move.w x,d0
sub.w bounds+left(a0),d0 ;Make x bounds 0 relative
lsl.w #2,d0 ;Convert x to pixels (4 bytes/pixel)
adda.l d0,a1 ;Calculate pixel address
move.l pixelValue,(a1)
}
}
You may wonder why this routine takes both a pixMap and a base address. Can't it just
get the base address from the pixMap directly? The answer is no, since the base
address of a GWorld can be a handle rather than a pointer and in the future might be
something different again. You must pass in abase address that's good in 32-bit
addressing mode. The GetPixBaseAddr routine called by GWSet32PixelC returns the
correct base address given a pixMap.
Revising our sample code isn't as trivial as it was before because of the additional
assumptions made by these fast get and set pixel routines. Here's the new version of
the code:
/* Get pixMap's 32-bit base address. */
srcBaseAddr = (long *) GetPixBaseAddr(myPixMapHandle);
myPixMapPtr = *myPixMapHandle;
/* WARNING: The pixMapHandle is dereferenced throughout these next
loops. The code makes sure memory will not move. In particular,
it's important that the segment containing the FastGWGet32Pixel