Jul 96 Factory Floor
Volume Number: 12
Issue Number: 7
Column Tag: From The Factory Floor
A Little CodeWarrior History
By Dave Mark
This month, we’re going to talk with John McEnerney, one of the compiler writers
at Metrowerks.
Dave: How did you first hook up with Metrowerks?
John: I first met Greg Galanos when I was the development manager at
Symantec’s Language Products Group. Greg was trying to get me interested in
doing some sort of deal with the fledgling Metrowerks, and I mostly ignored him
because they were trying to compete aggressively with my first product, THINK
Pascal. I would have never guessed that a few years later he would offer me the
best opportunity of my career.
Dave: When did you leave Symantec?
John: I left Symantec in October ’92, taking about 6 months off to figure out
what I wanted to do next. I didn’t have any real plans, but I figured I’d find some
way to do PowerPC work. I didn’t relish the thought of trying to write an entire
C++ compiler, so I considered doing a Pascal product on my own.
Around this time, Greg had heard from Rich Siegel (of BBEdit fame) that I was no
longer at Symantec, and he called me right away. The first thing he said to me
was, “describe your dream job,” and I told him I wanted to write a PowerPC code
generator for the upcoming Power Macintoshes. I flew to Montreal to meet him
and his partner, Jean Belanger. We had some Italian food, drank some wine, and
they told me a little about their Pascal and Modula products. I was really hot to
write a PowerPC backend, but I was not that impressed with their technology.
We talked about various contracts, but I didn’t have a really solid feeling yet.
Dave: What finally convinced you to go with Metrowerks?
John: In February ’93, Greg asked me to meet with him in Palo Alto to get a look
at a C compiler that they had just acquired; a guy named Andreas Hommel in
Hamburg had been writing it as a hobby. It ran on the Macintosh, had a simple
but nice IDE reminiscent of early versions of THINK C, and it was fast. I spent
about an hour looking through the source code: it was well organized, the
compiler front-end and back-end were cleanly separated, the code was easy to
follow, and in addition to being a full ANSI C compiler, it had a lot of the C++
language implemented already.
It was clear that Greg had found a diamond in the rough, the perfect platform for a
native PowerMacintosh product. A few hours later we had a contract - I had
about 6-8 months to write a PowerPC back-end and linker. Andreas would finish
the C++ language implementation, and a few guys in Montreal (Berardino
Baratta, Marcel Achim) would work on the IDE and a new Pascal front-end. We
immediately hired Greg Dow, who had written the THINK Class Library for
Symantec when I was there, to write a new application framework: PowerPlant.
We must have hooked up with Jordan Mattson from Apple around this time,
because a week or so later he sent me one of their RS/6000s to help me get
started. Between him and Alan Lillich, who I had met at all the early PowerPC
meetings that Apple had been holding for their key developers, I got pretty much
everything I needed from Apple.
So, I now had a contract to do the most interesting work I could imagine; all I had
to do was figure out where to start.
Dave: What was it like working with Andreas’ compiler?
John: Andreas’ compiler was pretty traditional in its organization. The
front-end made a single pass over the source code, performing lexical analysis as
it went, and generated an intermediate representation (IR) that consisted of
expression trees, labels, and branches. It took about a week to totally remove the
68K code generator from the rest of the compiler, and put in stub routines where
the front-end and the back-end connected so that everything would still link. If I
could fill in all the stub routines in exactly the right way, we’d have a PowerPC
compiler.
The first thing I did was write a routine that dumped the IR in human-readable
form - I don’t know how Andreas got his 68K code generator to work without
that, I guess he can keep more in his head than I can. Looking at the expression
trees on the screen allowed me to visualize how the code generator would proceed.
Most CISC compilers spend a lot of time working on the IR trees themselves.
Traditional global optimizations like loop-invariant code motion or common
subexpression elimination are performed by rewriting the IR trees into
optimized IR trees. The code generator gathers information about the shape of the
trees, deciding how many registers will be needed, which addressing modes will
be used, etc. After instructions are generated they are largely ignored except for
small “peephole” optimizations. (A notable exception to this is the gcc compiler,
which transforms the expressions into a simple algebraic representation called
RTL and uses repeated “peephole” optimizations derived from a machine
description to coalesce these RTL expressions into complex instructions and
addressing modes.)
Most of the RISC compilers that I’d read about in the compiler literature used a
different approach: immediately transform the IR trees into a low-level
representation that was similar or identical to the actual RISC instructions of the
target machine, and perform all optimizations at the machine instruction level. I
decided to use this technique in my PowerPC code generator.
Dave: What was your basic approach to code generation?
John: Strange as it may seem, the first part of the code generator I actually
wrote was the instruction scheduler - the phase that reorders instructions to
minimize latencies caused by load delays, and to permit floating-point and
integer instructions to execute in parallel. I needed to know if my low-level
representation - I called it a “pcode” (no relation to the UCSD Pascal pcode) -
had enough information for all the phases I would eventually write, and since the
scheduler needed a lot of information, it would serve to prove the design of the
pcode. Of course, I had to rewrite the scheduler twice more: the first time was to
fix the original one, which had some design flaws, and the second time was to
make it more general to support 601, 603 and 604 CPUs.
Once I finished the scheduler, I had my data structures organized and all the
support routines in place, so I started writing the instruction selection phase -
the “guts” of the code generator. This phase visits the IR tree and generates
pcode. It does try to recognize certain tree patterns, like opportunities for
FMADD and FMSUB routines, but since there are no complex addressing modes
and very few complex instructions, it is mostly a straightforward translation to
PowerPC instructions.
To get short-term results, I wrote a quick-and-dirty register allocator, and
some code to display the generated pcode instructions, and was able to get most of
the code generation debugged this way. I decided to use a proprietary object code
format, derived from the one we were already using in our 68K compiler and
linker, since I could get this working faster than trying to write an XCOFF linker.
I spent a few weeks getting the linker working, finished the part of the code
generator that wrote the object file, and I could actually compile and link small
programs.
Dave: How did the debugger fit into all this?
John: Around August ’93 the project was falling into place, but we still didn’t
have a source-level debugger. In a most serendipitous event, Dan Podwall, a
friend of mine from Symantec, called and asked whether there were any
opportunities at Metrowerks. Greg Galanos called him right away, hired him on
the phone, and 4 weeks later he had written a debugger - in PowerPlant, no less
- that could single-step and set breakpoints. This would be the first commercial
PowerPlant program - in fact, the first PowerPlant program of any kind aside
from Greg Dow’s demos.
Dave: How did you build the compiler?
John: By September ’93 we had some prototype PowerPC hardware, and I had a
working code generator and linker which ran on the 68K Macintosh and generated
PEF executables that ran on the prototypes. Since this compiler was already
built using our own 68K compiler, it was pretty easy to rehost it on the Power
Macintosh: we made the changes for the Universal Headers and routine
descriptors and such, then compiled it with itself on the 68K machine, which
gave us (after some debugging!) a working PowerPC-hosted PowerPC compiler.
With a little bit of trickery, mandated by differences between PowerPC
floating-point hardware and the 68K SANE software floating-point architecture,
we were able to rehost the 68K compiler on the Power Macintosh as well. We
now had the fastest compilers on the Macintosh.
Dave: What next?
John: I still had a lot of work to do on the PowerPC code generator. The biggest
task was to replace the quick-and-dirty register allocator with a graph
coloring-based allocator. This is one of the great algorithms in the history of
compilers. For years people had been trying to come up with an accurate way to
represent the lifetimes of variables, so that variables or temporaries that did not
overlap could share a register. A lot of ad hoc techniques were developed, but this
guy from IBM Watson Research Center named Greg Chaitin discovered a formal
approach that solved the problem better than anything that had been previously
attempted: build an “interference graph” which has an edge between any two
variables whose values may be live at the same time, and then try to color this
graph with N colors where N is the number of available registers.
So my code generator assumes it has an infinite number of “virtual” registers,
and generates the most efficient code it can under that assumption; for example,
it assumes that all local variables, arguments, and TOC pointers can be assigned
to a register. After the code is all generated, the register allocator tries to
rewrite the virtual registers using real PowerPC registers, and generates extra
code to “spill” values that couldn’t get a real register. In most cases, everything
gets a register since there are so many on the PowerPC. The smarter register
allocator probably makes the overall largest contribution to code quality.
The algorithm has one drawback: it has O(N^2) complexity. There are actually
programs which have so many intermediate expressions that the interference
graph gets too large and it takes several minutes to color it. So I had to keep
around the quick-and-dirty allocator as well, which is why you’ll sometimes get
an annoying message that says the code generator ran out of registers if you’re
compiling without global optimizations.
Dave: And so, CodeWarrior was born!
John: By December I had pretty much everything working. After a last-minute
dash to get C++ language support working on the PowerPC, we were able to burn
our first public release, DR/1, starting a long Metrowerks tradition of getting
things in under the wire and never missing a ship date. We introduced the
product at the San Francisco Macworld Expo with our huge 8-page MacWeek
advertisement, and CodeWarrior™ was born.
There were plenty of things to be cleaned up between DR/1 and DR/3, which was
our real “1.0” release. But by shipping DR/1 and DR/2 when we did, and by
working closely with a lot of the major Macintosh software vendors, we were
able to help a lot of companies get their software ported to the PowerMac that
otherwise might not have.
For me, I had accomplished what I had wanted to when I was back at Symantec:
building the PowerPC compiler that most users would use to port their code to
the new Power Macintoshes. And Greg had kept his promise and given me my
dream job.