Fast Blit Strategies
Volume Number: 15
Issue Number: 6
Column Tag: Programming Techniques
Fast Blit Strategies: A Mac Programmer's Guide
by Kas Thomas
Getting better video performance out of the Mac
isn't hard to do - if you follow a few rules
Ironically, the main performance bottleneck for game programmers today - as ten
years ago - is getting pixels up on the screen. With the advent of 100 MHz bus speeds,
built-in hardware support for 2D/3D graphics acceleration, megabyte-sized backside
caches, and superior floating-point performance, you'd think screen refresh rates
would no longer be an issue. But as CPU and bus speeds have increased, so has monitor
resolution - and pixel throughput. Providing the user with cinematic animation at full
screen resolution remains a formidable challenge.
Because of human interface concerns, writing direct-to-screen has always been
treated as something of a taboo in the Mac world. QuickDraw was invented to save us
from having to resort to such low-level techniques. But there are still times when
writing directly to video memory makes sense, particularly in game programming,
where anything goes when it comes to user interface design. In this article, we won't
shy away from direct-device writing or treat it as a taboo subject; in fact, we'll
concentrate on it, with a view toward optimizing our code for the G3 (and soon, G4)
chip architecture. We'll talk about assembly language, cache issues, line-skip
blitting, and how to customize QuickDraw without patching any traps (among other
subjects). In order to keep the pace brisk, we'll assume that you already know what a
GWorld is, how to manipulate PixMaps, and the basics of display modes. If you need to
brush up on these items, a good crash course can be found in Dave Mark's Mac Programming FAQs book (IDG Books, 1996).
Snappy Screen Drawing
First, let's summarize the basics. (If any of the following sounds unfamiliar, you
should probably read up on video device fundamentals.) It should go without saying that
maximizing screen drawing performance usually means taking advantage of one or
more - or possibly all - of the following techniques:
• Use 8-bit color instead of 32-bit (which cuts bus traffic by 75%).
• Cache and redraw dirty rects only (so you don't repaint more territory
than necessary). In games where most of the screen's pixels don't change from
frame to frame, it pays to just keep track of the regions that need redrawing,
and only redraw those regions.
• Use pixel-skip draw techniques. This means implementing your
sprite-drawing in such a way as to draw only the non-empty pixels in a
sprite, skipping over "underlay" areas. But instead of inspecting values in a
mask, you can get extra performance by implementing a "run length
approach wherein runs of visible sprite bytes are packed together. The idea is
to inspect the run-length byte (like the first byte of a Pascal string) and draw
that many bytes; then inspect the skip-length byte of the next (empty) run,
and skip over that many bytes; and so on. If you can just inspect length bytes
rather than mask bytes, you can save cycles.
• Use line-skip draw routines. Simply put, this means drawing every other
line of the image, the way an interlaced NTSC television picture is drawn. By
simply omitting half the drawn data, you cut the redraw time in half. (The
user sees a dithered image.) If the blit area is small enough, you may be able
to write directly to the screen (without tearing or flashing) at vertical
retrace time, instead of writing to a back buffer. (When you write to a back
buffer, of course, you're writing everything twice: once to the buffer, once to
the screen.)
• Draw 64 bits at a time - or however many bits the architecture will
support. Someday there will doubtless be a 128-bit "long double" or "double
double," the way there is now a 64-bit "long long." (If you don't know about
long longs, consult your compiler documentation.) Until then, for best
performance, you should always copy data to the screen as 64-bit doubles -
never as anything shorter. All PPC chips have thirty-two floating-point
registers and all can load a 64-bit double in one CPU cycle, so it makes sense
to take advantage of the throughput potential that the architecture offers.
Anything less represents wasted cycles.
• Observe proper data boundary alignment. (Write to and from addresses
that are evenly divisible by 4, 8, or 16 - whatever is appropriate to the
architecture and the drawing mode.) Also try to make all window and sprite
dimensions a multiple of 16 or 32. Most graphics accelerator boards are
designed to deliver their best performance when this is the case.
• Access data linearly (by incrementing pointers); avoid pointer
arithmetic involving multiplications. Some applications even go so far as to
maintain tables of line-start addresses, so that pointer addresses can be
accessed via table lookup instead of calculated on the fly. (Depending on the
chip architecture and cache performance, this tactic will either work like a
charm or generate pipeline stalls.)
• Use wide, shallow graphic elements in preference to tall, narrow ones.
(There are more raster lines, and therefore more pointer arithmetic, in tall
graphics.)
• Implement your own custom drawing routines where appropriate,
including, possibly, a replacement for CopyBits().
Getting the Most out of CopyBits
The Mac's main general-purpose blit utility is, of course, QuickDraw's venerable
CopyBits() routine. Because so many OS and user processes rely so heavily on it, and