All Databases MacTech Vol 15-1999

Fast Blit Strategies

Volume Number: 15

Issue Number: 6

Column Tag: Programming Techniques

Fast Blit Strategies: A Mac Programmer's Guide

by Kas Thomas

Getting better video performance out of the Mac

isn't hard to do - if you follow a few rules

Introduction

Ironically, the main performance bottleneck for game programmers today - as ten

years ago - is getting pixels up on the screen. With the advent of 100 MHz bus speeds,

built-in hardware support for 2D/3D graphics acceleration, megabyte-sized backside

caches, and superior floating-point performance, you'd think screen refresh rates

would no longer be an issue. But as CPU and bus speeds have increased, so has monitor

resolution - and pixel throughput. Providing the user with cinematic animation at full

screen resolution remains a formidable challenge.

Because of human interface concerns, writing direct-to-screen has always been

treated as something of a taboo in the Mac world. QuickDraw was invented to save us

from having to resort to such low-level techniques. But there are still times when

writing directly to video memory makes sense, particularly in game programming,

where anything goes when it comes to user interface design. In this article, we won't

shy away from direct-device writing or treat it as a taboo subject; in fact, we'll

concentrate on it, with a view toward optimizing our code for the G3 (and soon, G4)

chip architecture. We'll talk about assembly language, cache issues, line-skip

blitting, and how to customize QuickDraw without patching any traps (among other

subjects). In order to keep the pace brisk, we'll assume that you already know what a

GWorld is, how to manipulate PixMaps, and the basics of display modes. If you need to

brush up on these items, a good crash course can be found in Dave Mark's Mac

Programming FAQs book (IDG Books, 1996).

Snappy Screen Drawing

First, let's summarize the basics. (If any of the following sounds unfamiliar, you

should probably read up on video device fundamentals.) It should go without saying that

maximizing screen drawing performance usually means taking advantage of one or

more - or possibly all - of the following techniques:

• Use 8-bit color instead of 32-bit (which cuts bus traffic by 75%).

• Cache and redraw dirty rects only (so you don't repaint more territory

than necessary). In games where most of the screen's pixels don't change from

frame to frame, it pays to just keep track of the regions that need redrawing,

and only redraw those regions.

• Use pixel-skip draw techniques. This means implementing your

sprite-drawing in such a way as to draw only the non-empty pixels in a

sprite, skipping over "underlay" areas. But instead of inspecting values in a

mask, you can get extra performance by implementing a "run length

approach wherein runs of visible sprite bytes are packed together. The idea is

to inspect the run-length byte (like the first byte of a Pascal string) and draw

that many bytes; then inspect the skip-length byte of the next (empty) run,

and skip over that many bytes; and so on. If you can just inspect length bytes

rather than mask bytes, you can save cycles.

• Use line-skip draw routines. Simply put, this means drawing every other

line of the image, the way an interlaced NTSC television picture is drawn. By

simply omitting half the drawn data, you cut the redraw time in half. (The

user sees a dithered image.) If the blit area is small enough, you may be able

to write directly to the screen (without tearing or flashing) at vertical

retrace time, instead of writing to a back buffer. (When you write to a back

buffer, of course, you're writing everything twice: once to the buffer, once to

the screen.)

• Draw 64 bits at a time - or however many bits the architecture will

support. Someday there will doubtless be a 128-bit "long double" or "double

double," the way there is now a 64-bit "long long." (If you don't know about

long longs, consult your compiler documentation.) Until then, for best

performance, you should always copy data to the screen as 64-bit doubles -

never as anything shorter. All PPC chips have thirty-two floating-point

registers and all can load a 64-bit double in one CPU cycle, so it makes sense

to take advantage of the throughput potential that the architecture offers.

Anything less represents wasted cycles.

• Observe proper data boundary alignment. (Write to and from addresses

that are evenly divisible by 4, 8, or 16 - whatever is appropriate to the

architecture and the drawing mode.) Also try to make all window and sprite

dimensions a multiple of 16 or 32. Most graphics accelerator boards are

designed to deliver their best performance when this is the case.

• Access data linearly (by incrementing pointers); avoid pointer

arithmetic involving multiplications. Some applications even go so far as to

maintain tables of line-start addresses, so that pointer addresses can be

accessed via table lookup instead of calculated on the fly. (Depending on the

chip architecture and cache performance, this tactic will either work like a

charm or generate pipeline stalls.)

• Use wide, shallow graphic elements in preference to tall, narrow ones.

(There are more raster lines, and therefore more pointer arithmetic, in tall

graphics.)

• Implement your own custom drawing routines where appropriate,

including, possibly, a replacement for CopyBits().

Getting the Most out of CopyBits

The Mac's main general-purpose blit utility is, of course, QuickDraw's venerable

CopyBits() routine. Because so many OS and user processes rely so heavily on it, and

Referenced by (9):