Code Mechanic
Volume Number: 13
Issue Number: 5
Column Tag: develop
Code Mechanic: Better Than Ever Stress
Testing
by Dave Evans
There are few things more frustrating than losing access to your debugging tools due to
a freeze, because you can't fix what you can't diagnose. The best course is to stop
freezes before they start, so I'd like to share a common cause of freezes I've found. I'll
also discuss some of the stress-testing options that are available to help you catch
freeze-causing problems you might have missed, including an improved debugging
tool.
Veteran readers of develop may notice a new title for this column. The previous title,
"Balance of Power," was apt for its time, indicating a focus on PowerPC issues. But
now that all new MacOS computers are PowerPC-based, everybody's writing about
PowerPC, and my efforts in this area are complete. This new title reflects a focus on
the mechanics of code tuning, with tips for improving your application's performance
and stability, which I hope you'll find just as useful.
Protect Your Vectors
Even if you use a PowerPC-based MacOS computer, the first 256 bytes of memory are
dedicated to 680x0 exception vectors, which the 680x0 software emulator uses to
emulate 680x0 exceptions and interrupts. On a 680x0-based computer, these values
are read by the processor itself when handling an exception or servicing an interrupt.
Under System 7, these important vectors are not memory protected. Any program can
read from or write to them, possibly resulting in a serious failure. While not all of
the vectors are used, modifying some of them will cause an immediate freeze, leaving
you without access to your debugging tools. You probably don't address these vectors
intentionally, but it often occurs accidentally when a nil pointer or empty handle is
de-referenced.
Unintentionally reading from these vectors will produce a random result. In most
cases the vectors are addresses of special system routines; these vectors can have any
value, and they vary significantly from one computer model to another. As an example
of how easy it is to cause a problem in this area, take a look at the following C code,
similar to that found in some applications:
front_window = FrontWindow();
if (front_window->windowKind < 0)
MyDeskAccessoryRoutine(front_window);
The developers didn't realize that FrontWindow can return nil when no windows are
open. In that case the application de-references the nil pointer and makes a logical
decision based on the sign of the half word at $6C in low memory, which is the high
half of the interrupt level 3 vector. On most Macintosh computers released before
1995, this vector pointed into ROM starting at address $40800000. Because of this,
the applications would test the high half word value of $4080, and they wouldn't run
the desk accessory routine. This was the right behavior, but for the wrong reason;
disaster was averted by luck.
Beginning with all PCI-based PowerPC computers, ROM starts at location
$FFC00000. During the development of these computers, we found that applications
with code like the above would crash because they executed unexpected code after
comparing the new half word value of $FFC0. We were able to work around their
problem by changing the interrupt level 3 vector to point to a routine in RAM. This
changed the high half word value to be a small positive number, and the applications
behaved as expected. Still, the best case would have been if the problem could have
been avoided in the first place. The following code is an example of what would have
been a better, crash-free approach:
front_window = FrontWindow();
if (front_window && front_window->windowKind < 0)
MyDeskAccessoryRoutine(front_window);
Checking for nil pointers or handles is one way you can avoid these crashes in the first
place. Checking for empty handles is another necessary step, since unlocked
relocatable blocks that are marked purgeable may disappear any time memory can
move.
To detect problems with purgeable blocks, you'll need tools to stress test your
application. Utilities that display heap zones, allowing you to compact and purge a heap
on demand, are a good start. For serious testing, however, you'll need a stress tool that
operates all the time. One good tool for this is MemHell, which will compact and purge
your heap whenever a Memory Manager routine that might move or purge memory is
called. This slows down execution of your tests, but it will flush out problems with
purgeable blocks.
So, while accidentally reading from low memory can cause unexpected results,
accidentally writing to low memory can be fatal, and this is one of the most common
causes of freezes that I've noticed. You may think this could never happen in your code,
because none of your blocks are purgeable and you always check errors after allocating
pointers. Think again; there are plenty of other opportunities. Do you check for an
error after every GetResource call? Getting an unexpected error - from a corrupted
resource file, for example - is one way you can end up with a nil handle. Besides
diligent review of your code, you need to do stress testing to flush out possible errors,
or freezes are likely to result.
Are You Stressed Enough?
There are a number of tools to help add stress to your testing. I've already mentioned
MemHell for finding problems with purgeable blocks. You'll similarly need a tool to
find reads and writes to the exception vectors.
The simplest choice is the ubiquitous and venerable EvenBetterBusError, written by
Greg Marriott. This tool safeguards the first four bytes of memory, which are very
often accidentally written over or read from. To detect reads, it places in the first four
bytes of memory a value which when de-referenced will cause a crash. If you use a nil
pointer or empty handle, the illegal value is likely to be used as data or de-referenced,
leading to a crash. To detect writes, it checks periodically to see if the value that it
placed has been overwritten; if so, you'll be notified with a DebugStr message.
EvenBetterBusError is included as a dcmd in MacsBug beginning with version 6.5.4.
I've extended EvenBetterBusError to be more aggressive. The new version,
YetEvenBetterBusError, writes a value over the first 256 bytes of memory which
will cause a crash into your debugger when de-referenced. It also checks periodically
for writes to these locations, but more frequently than EvenBetterBusError does. Like
EvenBetterBusError, upon noticing a write to these locations it will notify you with a
DebugStr message. YetEvenBetterBusError can be found at http://www.mactech.com.
To implement YetEvenBetterBusError, I had to sacrifice some compatibility with
existing applications. Any application code that assumes the exception vectors start at
address 0 will no longer function correctly. Most applications don't use the exception
vectors directly, but some copy protection schemes do modify the vectors.
The correct way to determine the location of the exception vectors is by using the
680x0 instruction MOVEC, which must always be executed in supervisor mode. The
location of the first vector is stored in the 680x0 VBR (Vector Base Register). To read
the address, you would write the following assembly code:
_EnterSupervisorMode ; old sr result in d0
movec vbr,a0 ; get the vbr
move.w d0,sr ; restore the old sr
Always use the VBR to find these vectors. Although early versions of the MacOS always
placed them at location 0, they're now often elsewhere. When virtual memory is
turned on, for example, the vectors will actually reside in the system heap, and the
VBR will point to them. To maintain compatibility, however, if virtual memory doesn't
handle an exception it calls through to the original vector table at location 0. This is
why even with virtual memory on, writing over the low-memory exception vectors
can still cause a freeze.
YetEvenBetterBusError is able to overwrite and then monitor the first 256 bytes of
memory by moving the exception vector table entirely. So, even when virtual memory
is on, with YetEvenBetterBusError installed the original low-memory vectors are
never called. This is why some existing applications may be incompatible with
YetEvenBetterBusError.
A Cure for Test Anxiety
It's true that fully testing your code to reflect all possible configurations and user
actions can be a near-impossible task. But the perceived stability of both your
application and the computer depends on how well we all write and test our software.
To do the best possible job, use the stress-testing tools mentioned in this column or in
the article "Squashing Memory Leaks with TidyHeap" in this issue. Do the right thing:
stress test, then relax!
Thanks to Pete Gontier, Chris Jalbert, Bo3b Johnson, Dave Lyons, Quinn "The
Eskimo!", and Keith Stattenfield for reviewing this column.