Hertzfeld
Volume Number: 4
Issue Number: 6
Column Tag: Programmer's Spotlight
Andy Hertzfeld on QuickerDraw
By Chester Peterson Jr., Reporter-At-Large, Lindsborg, Kansas
The Story of QuickerDraw
QuickDraw, the imaging program used on the Mac, doesn’t always live up to its
name when used on the Mac II.
Incredibly fast and wonderfully crisp at one bit per pixel, it bogs down to
something that could be more aptly described as SlowDraw at eight bits per pixel. The
normal subtle responsiveness of the Mac II suffers.
“Actually, in the eight bit per pixel mode it almost feels like you’re using your
Mac II under water,” is how Andy Hertzfeld describes the action--or lack thereof.
Hertzfeld is, of course famous in Mac circles as the man responsible for much of
the Mac’s Operating System and design of the Toolbox.
So, in late December he decided to satisfy his curiosity about the QuickDraw
graphics routines and how they were coded. This is really easy to do, he says.
Just get the Mac II to do the graphics operation in which you’re interested, and
then randomly hit the interrupt button. This will interrupt it statistically in the
place it’s executing the most--the inner loop.
What Hertzfeld discovered was that the inner loops weren’t optimally coded. His
initial strategy was to move the entire QuickDraw into RAM. He wrote an INIT that
moved 60k of the ROM out into RAM where he could patch it.
And, although there were some problems with that, Hertzfeld got it working.
But, as he progressed, disassembling to the bottom of the system, he saw this wasn’t
really necessary.
The reason: Apple had the foresight to have this low-level jump table that all the
inner loops are bottle-necked through. All he had to do was replace addresses in the
little memory jump table to take over the inner loops in a clean way.
“So, once I saw that, I thought, ‘Hey, this could be a project worth looking
into’,” Hertzfeld recalls.
“And, the more I got into it, the more I was able to find ways to increase the speed
of QuickDraw. I ended up improving the speed of some important operations by a factor
of three or so, ending up with QuickerDraw.”, or as Apple has called it in release 6.0
of the operating system, QuickerGraf.
Something that confuses people and which he thinks is important is that the
performance increases are anything but flat. Instead it’s a spiky curve, with some
things speeding up a whole lot and others not at all.
The explanation is that the speed-ups are both case dependent and also data
dependent. Depending on exactly what you’re doing, you’ll get different responses.
“My point is that the speed-ups aren’t uniform,” Hertzfeld points out. “Apple
has some of the code, such as when you say either EraseRect or PaintRect with black,
that are already fairly well optimized. I wasn’t able to improve them only because
they’re already about as good as they can be.
“But, if you take PaintRect with a color that isn’t black or white, then it goes to a
different loop that wasn’t well done. Here’s where I was able to improve speed by that
factor of three.
Hertzfeld believes that the most important item in the graphical programmer’s
bag of tricks is special casing. In other words, certain instances of a particular
problem are easier to handle than are other instances.
So, he thinks that when speed isn’t important that a programmer should try to
fold his cases to write as little code as possible to handle the entire situation.
But, when speed is essential, as it is in the QuickDraw routines, the opposite
approach must be used, he says. This involves picking off all the different cases and
seeing if you can handle each case a little faster.
A compromise Apple made on its standard graphics card was that it has to support
one bit, two bits, four bits, and eight bits per pixel.
A lot of the QuickDraw routines were coded in such a way that they were common
for four different screen formats, according to Hertzfeld.
“I was able to special case the eight bit per pixel case , because that’s the only
one that’s really important from a performance point of view,” he says.
“While Apple used rather slow bit-field instructions, I used special cache code to
take advantage of the faster addressing modes in the 68020 to do things faster.”
“I also saved some registers doing that, registers that the Apple code uses just
for maintaining which bits per pixel are to be used. Freeing up these for other things
allowed me to go faster.”
Hertzfeld also took advantage of the “principal of locality”. He defines this as
meaning whatever you’re doing, it’s pretty likely you just did the same thing a short
time ago.
He exploited this in producing QuickerDraw through the use of caches. In the
computationally intensive parts of QuickDraw like the arithmetic transfer modes, he
put in caches that say, “Hey, this is just the same as what I saw before--I don’t have
to do all the work again, because I’ve already figured out the answer.”
Hertzfeld used this technique in the instance of copy bits to two different pix
maps that have different color look-up tables, a common thing on the Mac II with
digitized images.
Each digitized image would have its own color look-up table that wouldn’t be
identical to the one on the screen.
When you do a copy bits, it has to do a mapping operation, taking each pixel and
looking it up in a table to find the correct pixel in destination bit map.
Hertzfeld changed this so that long word maps are remembered, short- circuiting
the memory references involved in doing the look-up. He used similar techniques in
many places to gain significant speed-ups.
“You want to hit memory as little as possible,” he advises. “A lot of the Apple
loops were doing essentially one memory reference per pixel.
“My routines always do one memory reference per long word. Why? Because the
68020 is capable of pulling in 32 bits just as quickly as it can pull in eight bits at a
time.”
The Apple routines makes it a little easier to code just accessing memory eight
bits at a time, while Hertzfeld accesses memory 32 bits at a time, spinning it around
in the registers and mailing it faster.
“You just attempt to be as clever as possible when you’re trying to code,” he
says. “This is interesting code to write, because it has an unusual sort of design
criterium.”
With most code in normal circumstances you’re always balancing the twin
trade-offs between speed and space, or as Hertzfeld puts it, trying to serve two
masters while producing the nicest code possible.
But, the interesting thing about the QuickerDraw code he wrote is that space isn’t
a consideration. He says the system spends so much time in the QuickDraw inner loops
that he did everything to make them go faster. He used a different coding style that also
made it a little more interesting and fun.
“Like, for example, I did everything possible to avoid a subroutine call in the
inner loops,” Hertzfeld explains. “You copy 50 in-line instructions, because it’s
worth it in the context of the inner loop.”
Hertzfeld also devised another creative and interesting technique to speed up
QuickDraw, something he calls region counting.
“As I was speeding up QuickDraw, I was just a little bit disappointed that I wasn’t
getting as much speed-up as I would have liked when I was clipping to regions,” he
says.
“What I then realized is that the region mask doesn’t change much from scan line
to scan line.”
“The other thing to notice is an eight bit per pixel region mask is eight times as
long as it would be in one bit per pixel, or eight times as likely to be homogeneous,”
Hertzfeld observes.
If you pick up a long word of the region mask it’s extremely likely that it will be
all ones or all zeros. Hertzfeld started special casing the region mask.
He found that normally when masking you have to do something like a
seven-instruction sequence that involves three memory references to plot a long word
with a mask. But, if it turns out the mask is all zeros you don’t have to do anything,
because it’s all going to be masked out.
You don’t even have to hit memory at all, just skip over it. If the mask is all
ones, you can just use one store instead of having to read it back and do the coding in
order to accomplish the masking.
So, he began special casing that way. And, even though the tests cost him a little,
he still won enough to make it worthwhile, because the region mask does tend to be
homogeneous. The result: A 40 percent speed increase from that special casing of the
region mask.
Then as Hertzfeld was looking at the region mask as it went by, he began counting
up runs in it so it could remember how many successive long words in a row were all
zeros or how many successive long words in a row were all ones.
“If the region mask doesn’t change from scan line to scan line, which it doesn’t
more than 90 percent of the time, I don’t have to fetch it. As a matter of fact, if it’s all
masked out at the beginning I can just skip over it,” he observes.
Where the Apple routines were pulling a long word from memory, then sticking
the same long word back, Hertzfeld just skipped over all that.
He’s proud of this original technique of region counting for obtaining a
tremendous speed increase when things are heavily clipped.
Contrary to a misconception about its size, the QuickerDraw memory resident
code is only approximately 10k. And, half of that is devoted to the arithmetic transfer
modes that aren’t used too often.
The QuickerDraw file is 27k, but that includes logo resources. The nice colored
picture that it comes up with is 12k alone.
Incidentally, the arithmetic transfer modes were introduced with the Mac II and
are only relevant to color. Most applications don’t use them yet.
Hertzfeld accomplished his QuickerDraw core work in a two-week period
between this last December 22 and January 7. It then became apparent that Apple was
interested in his acceleration of QuickDraw.
Hertzfeld realized that if he was truly producing a speed-up, then he’d also have
to address the arithmetic transfer modes. A second two-week burst of work got these
speed-ups implemented, too.
The bottom line: QuickerDraw involves no change in the architecture of
QuickDraw. Instead, view it as implementing a high performance tune-up of Apple’s
standard.
Hertzfeld signed a non-exclusive contract with Apple for QuickerDraw in
February, accepting less money so he could upload it to CompuServe and distribute it
on his own.
Apple will incorporate QuickerDraw in its next release file 6.0, due out at the
end of May.
“Although there are a few cases that I didn’t handle, I do think I’m pretty close to
the optimal plotting speed of QuickDraw,” Hertzfeld comments. “I basically just
re-implemented the inner loops so they were more efficient.”
There will be no need to further refine QuickerDraw for the 68030. This is
because it has an instruction set identical to the 68020’s.
“The things that will make Apple change QuickDraw next are the architectural
issues such as scaleable fonts and resolution independent display routines--basically
catching up with Display Postscript,” Hertzfeld thinks.
He’d like to see Apple offer both an enhanced QuickDraw and Postscript so
applications programmers could select their choice for both screens and printers.
“The Macintosh would be better off if it could have both. And, I also think it
would be a little less risky for Apple than to continue trying to develop on their own all
the things that Postscript does so well”, Hertzfeld says.
“In the meantime, my QuickerDraw ‘tune-up’ will make graphics production
easier and faster on the Mac II.”
Hertzfeld on Creativity
Is computer programming creative, creative in the sense as producing a
masterpiece painting or writing a best-seller?
“Absolutely!” Hertzfeld states.
“There are two different types of programming creativity, though,” he advises,
“and both are equally important in a good programmer.”
The first sort of creativity is involved in initially picking the right area and then
the right problem on which to work. This involves thinking about what the users
really need that will help them the most.
Then there’s the actual writing of code and choosing instructions which can be as
individualistic as any painting or writing style, he says.