MacsBug Revisited
Volume Number: 15
Issue Number: 6
Column Tag: Tools Of The Trade
by Daniel Jalkut
This is not your father's low-level debugger
MacsBug is an extremely powerful, low-level debugger which Apple maintains and
distributes for free to developers of Mac OS software. From its humble beginnings as
the Motorola Advanced Computer Systems Debugger in 1981, it has evolved into the debugger that many developers both inside and outside of Apple depend upon to get their
jobs done.
The aim of this article is not so much to introduce MacsBug as it is to reintroduce it.
Enough basic introductions to MacsBug have been written that I don't believe a
repetition is necessary. If you are interested in a detailed description of MacsBug's
basic features, I suggest you consult one of the publications mentioned at the end of this
article. I suspect that most of you have heard of MacsBug, and a majority of you has
probably used it to varying extents. I hope to provide something that each of you can
use to further your use of MacsBug in the battle against buggy code.
In recent years, MacsBug has received less attention from Mac OS technical
publications than it did in the past. Many developers have switched to using
source-level debuggers like the one included with Metrowerks CodeWarrior. Those
debuggers, while very convenient for the majority of bug diagnoses, have a limit to
their usefulness. MacsBug continues to prosper because of the elegance with which it
allows developers to overcome limitations of higher level debuggers. This article is
divided into three sections, each of which discusses a different aspect of this elegance.
The first section is a high level discussion of the thinking that goes on when diagnosing
a bug, and a description of some basic approaches you can take to work out the
diagnosis with MacsBug. The second section extols the virtues of MacsBug's
extensibility, a feature that should appeal to those of you with highly specialized or
unpredictable debugging needs. Finally, the third section divulges some of the exciting
features that MacsBug has gained in the past couple of years, which should convince the
biggest MacsBug skeptic that something here deserves a second look.
Before you experiment with any of the functionality described in this article, I highly
recommend downloading the most recent version of MacsBug from Apple's MacsBug
web site: <http://developer.apple.com/tools/debuggers/MacsBug/index.html>.
Part One: The Debugging Mindset
I don't know anybody who has mastered MacsBug. Like Herman Hesse's Siddharta and
his pursuit of nirvana, the mastery of MacsBug is a lifelong journey for which there
is no apparent end. This humbling reality, while disheartening, must not prevent you
from using what you do know to battle any bugs you encounter along the way! Like
chess, the basic tools can be easily taught, but the secrets of winning are learned only
by developing a mindset which allows you to apply simple techniques in new and
exciting combinations. In this section, I will describe a basic approach to debugging in
MacsBug, which you can expand upon as you develop your own debugging style.
MacsBug is typically encountered by one of two mechanisms: intentional or
unintentional CPU exceptions. An exception is just what it sounds like - a deviation
from the normal course of operations. Intentional exceptions occur when you, the
programmer, premeditate an event that causes MacsBug to interrupt the execution of
code and display the state of the computer on the screen. Unintentional exceptions
occur when somebody's code crashes. The mentality for dealing with either case is
different.
When you break into MacsBug intentionally, either by a breakpoint ("tvb", "atb",
"br", "brp") or a programmed exception in your code (Debugger or DebugStr), you
typically want to examine the state of things at a particular point in your code's
execution. Typical reactions to an intentional MacsBug entry are to check that the
parameters of a function look correct (e.g. "dm r3"), that the heap is not corrupt
("hc"), or to step through ("s", "t") code one instruction at a time and ensure that
your code does what you think it should. MacsBug is perfectly suited to debugging in
these circumstances, but since this type of debugging is especially well suited to a
source-level debugger, I will focus only on unintentional interruptions (crashes).
When MacsBug is encountered as the result of a crash, you don't have the same luxury
of understanding the circumstances as you do in an intentional interruption. Three
very important questions to ask are: "What caused the crash?", "Where am I?", and
finally "How did I get here?" The answers to these questions will greatly increase the
odds of pinpointing the problem. Let's discuss some basic strategies for answering each
question in turn.
What Caused the Crash?
MacsBug interrupts the execution of code when that code violates the rules of the CPU.
If you are lucky, MacsBug's explanation of the crash will provide some insight into the
problem. MacsBug displays the explanation immediately after it appears, and the
explanation can be repeated by issuing the "how" command (a caveat is that MacsBug
forgets this information as soon as you step or trace). Usually, the information
provided is broadly useful, but doesn't specifically pin down the cause of the crash.
Among the most common causes of crashes are memory exceptions and illegal
instructions.
Memory exceptions occur when a piece of code tries to read or write from memory that
either doesn't exist (is not part of the address space) or which exists but is off limits
to the crashing code. When this is the cause of a crash, the basic approach is to look at
the instruction that is causing the crash, and try to figure out where the bad address
came from. If you're lucky, the address was the result of the last subroutine called,
and you know the culprit immediately. If you're not so lucky, finding the origin of the
address might involve tracing back through hundreds of lines of MacsBug disassembly,
across several levels of subroutine calls.
Illegal instructions are sequences of bits that make no sense to the CPU. The CPU
depends on receiving a stream of instructions that corresponds exactly to actions it
knows how to perform. For instance, the hex value 0x38600000 means to the
PowerPC processor "put the value zero in register r3". When the PowerPC processor
receives a hex value like 0x11111111, it means nothing to the CPU so it causes an
illegal instruction exception. When you reach MacsBug because of an illegal
instruction, you are usually facing a bug where the program counter is pointing to data
instead of code. There is a small chance that your compiled code contains illegal
instructions, but this is unlikely unless there is a bug in the compiler, or you
compiled for a specific CPU (Motorola 68030 for instance) and tried to run on a
different CPU (Motorola 68000 for instance). Since the overwhelming majority of
times you hit an illegal instruction will be because you are executing data instead of
code, the solution to this problem requires answering the questions "Where am I?" and
"How did I get here?".
Where Am I?
You know why the computer crashed, but this information is useless if you don't know
the culprit code that caused it. There are three types of crashing code in this world:
your code, other people's code, and data.
If you are building with symbols and traceback tables enabled, as you should be for
pre-release builds, MacsBug will make it immediately clear whether you are
executing your own code or somebody else's. If MacsBug prefaces the disassembly in
the program counter display with a symbol, and you recognize the symbol as one of
your routine names - congratulations your code crashed!
It Came from Beyond...
If there are no symbols, and the code doesn't scream out to you what it does (immediate
recognition of assembly code's purpose is an uncommon, but real skill), you are going
to have to dig deeper to find out exactly whose code it is. An immensely useful MacsBug
command is the "wh" (short for "where") command, which identifies as best it can the
characteristics of a location in memory. Commonly, this command is issued without a
parameter, which causes it to give information about the current location of the
program counter. The output of the "wh" command ranges from "very useful" to "not so
useful, but I'll take it" to "pretty darn useless."
In the "very useful" category, when the code you are executing is part of a PowerPC
code fragment, the name of the fragment will be displayed along with the offset into the
fragment of the particular line of code. The name of a PowerPC code fragment can be
either an application name, a library name, or the logical name given to any piece of
PowerPC code that is stored in a fragment. Usually, the name will implicate your
application, another application, or a piece of System Software code.
If the code is 68K code or is not part of a CFM library, the result of the command may
be simply information about the Mac OS memory manager block in which the code
resides. This is less immediately useful than a library name, but you can sometimes
figure out whose code it is by examining the memory block for clues. In particular, if
the memory block is being tracked by the Resource Manager, then additional
information about the resource, including the file it came from, are included in the
output of the command. A good second stab is to take a look at the first several bytes of
the memory block, which sometimes contain symbols or character codes (e.g. 'PLUG',
or '"Apple Menu Options") which will sometimes bring you closer to the identity of the
code's owner.
In worst case scenarios, the code you crashed in is not even code that the Mac OS
memory manager keeps track of, and all MacsBug can muster is that the address is "in
RAM but not in a known heap." Aside from displaying memory around the program
counter and hoping for a lucky find, you are pretty much lost. This doesn't mean you
aren't going to find the bug, it just means you have to do so without the benefit of
knowing what the crashing code is.
Executing Data
Since data and memory coexist in the same address space of the Macintosh, and since
the Mac OS provides only limited memory protection (for file-mapped PowerPC
libraries), it is entirely possible that the CPU will be handed a chunk of data and be
asked to execute it as code. If you are at all familiar with the assembly language for the
CPU you are debugging, it will usually be immediately clear whether you are in data or
code memory. When you look at the code at and around the program counter ("ip"),
does it look like assembly language or does it look like MacsBug trying to translate
monkey talk to a rhinoceros? Common giveaways are if the disassembly contains a lot
of "ORI.B" instructions (68K), or a lot of "dc.l"" instructions (PowerPC). This is
because the hex representations of these instructions are 0x00000000, and data
memory usually contains a lot of zeroed-out memory.
If the memory looks like data, you are one step further to solving your bug. Usually
the data contains patterns or text which are characteristic of the code that owns it.
Find the beginning of the memory block with the "wh" command, and display memory
from the beginning of the block until you find something that looks incriminating.
Whether you can cinch the owner or not, it is time to move on to the next stage.
How Did I Get Here?
If you have been lucky, you now know the fundamental cause of the crash, and you know
whose code it is. That knowledge is mostly useful for providing clues to answer the
ultimate question: "how did my cute fuzzy bunny code get coerced off its path into the
cruel, crashing world?" Unless the nature of the crash or the location of the code was
telling enough to reveal an obvious solution, you must now attempt to turn back the
clock and study the sequence of events that led up to the crash. Fortunately, a time
machine is built into most calling conventions in the form of stack based link
addresses.
When a subroutine is called, the caller needs to communicate to the subroutine how it
will return control to the caller. It is abundantly common for this information to be
passed along on the stack. As each subroutine in turn calls its own subroutines, a bread
crumb trail is left that snakes all the way back up the stack to the original caller.
Examining this trail is referred to as performing a "stack crawl." If you were really
hard up, you could display memory from the stack pointer for several hundred bytes,
reason out the math implied by the values in the various positions, and determine for
yourself which parts of the stack refer to return addresses. Fortunately, MacsBug does
a fairly good job of doing this with its 'sc' command.
The "sc" command was originally only able to examine the stack for 68K calling
conventions, but today it is capable of looking for both 68K and PowerPC subroutine
calls, and listing them both in the same stack crawl listing. In the output, each line
represents the location of an instruction that caused a subroutine to be called. The last
entry displayed is the last subroutine call MacsBug could decipher from the stack. If
you look at the list from bottom to top, you are examining the subroutine history,
from most recent to oldest, that led to this point. Generally, you are looking for a piece
of code that looks like it is yours. If you find it, disassemble the code at that address
("ip ") to see if you can figure out exactly which of your code it is. Once you determine this, your strategy shifts from unintentional to intentional debugging -
move on to a source level debugger if you so desire. You can now set break points at the
crashing subroutine call and reproduce the crash on your terms.
There's No Stack Crawl!
Sometimes, as you should expect by now, things do not work out quite so peachy. The
stack and link registers are sometimes in a state such that MacsBug is of no help in
examining the stack crawl. Situations such as these call for desperate actions. Even if
MacsBug refuses to produce a useful stack crawl, you can search for clues by looking
at the link register contents yourself. Remember the "wh" command? Try "wh lr" to
get information about the address in the link register, and "ipp lr" to disassemble the
code surrounding the address (and hopefully the code that got you where you are).
MacsBug also provides a second stack crawl command, "sc7", for situations where the
first does not pan out. Its output is identical in structure to that of "sc", but it is much
less finicky about what constitutes a "return address" on the stack. The "sc" command
actually iterates backwards up the stack, examining the locations which
algorithmically (based on the well-defined calling conventions) should point to code
which called a subroutine in the call chain. The "sc7" command, on the other hand, will
examine every address on the stack, and as long as the code an address points to appears
to immediately follow an A-trap or subroutine call, it will list the address as a
"possible return address." Thus, the results of the "sc7" command need to be taken
with great skepticism, but as I said, this is a command for desperate situations.
I Mean There's No Stack Crawl!
If the stack crawl is not panning out in any way, shape, or form, there is another
technique that may be of aid. For whatever reason, the code you crashed in is just not
conducive to the usual stack crawl analysis. If the crash you are debugging, however,
can be temporarily avoided, then it may bode well to circumvent the crash, trace until