All Databases MacTech Vol 12-1996

Jul 96 Factory Floor

Volume Number: 12

Issue Number: 7

Column Tag: From The Factory Floor

A Little CodeWarrior History

By Dave Mark

This month, we’re going to talk with John McEnerney, one of the compiler writers

at Metrowerks.

Dave: How did you first hook up with Metrowerks?

John: I first met Greg Galanos when I was the development manager at

Symantec’s Language Products Group. Greg was trying to get me interested in

doing some sort of deal with the fledgling Metrowerks, and I mostly ignored him

because they were trying to compete aggressively with my first product, THINK

Pascal. I would have never guessed that a few years later he would offer me the

best opportunity of my career.

Dave: When did you leave Symantec?

John: I left Symantec in October ’92, taking about 6 months off to figure out

what I wanted to do next. I didn’t have any real plans, but I figured I’d find some

way to do PowerPC work. I didn’t relish the thought of trying to write an entire

C++ compiler, so I considered doing a Pascal product on my own.

Around this time, Greg had heard from Rich Siegel (of BBEdit fame) that I was no

longer at Symantec, and he called me right away. The first thing he said to me

was, “describe your dream job,” and I told him I wanted to write a PowerPC code

generator for the upcoming Power Macintoshes. I flew to Montreal to meet him

and his partner, Jean Belanger. We had some Italian food, drank some wine, and

they told me a little about their Pascal and Modula products. I was really hot to

write a PowerPC backend, but I was not that impressed with their technology.

We talked about various contracts, but I didn’t have a really solid feeling yet.

Dave: What finally convinced you to go with Metrowerks?

John: In February ’93, Greg asked me to meet with him in Palo Alto to get a look

at a C compiler that they had just acquired; a guy named Andreas Hommel in

Hamburg had been writing it as a hobby. It ran on the Macintosh, had a simple

but nice IDE reminiscent of early versions of THINK C, and it was fast. I spent

about an hour looking through the source code: it was well organized, the

compiler front-end and back-end were cleanly separated, the code was easy to

follow, and in addition to being a full ANSI C compiler, it had a lot of the C++

language implemented already.

It was clear that Greg had found a diamond in the rough, the perfect platform for a

native PowerMacintosh product. A few hours later we had a contract - I had

about 6-8 months to write a PowerPC back-end and linker. Andreas would finish

the C++ language implementation, and a few guys in Montreal (Berardino

Baratta, Marcel Achim) would work on the IDE and a new Pascal front-end. We

immediately hired Greg Dow, who had written the THINK Class Library for

Symantec when I was there, to write a new application framework: PowerPlant.

We must have hooked up with Jordan Mattson from Apple around this time,

because a week or so later he sent me one of their RS/6000s to help me get

started. Between him and Alan Lillich, who I had met at all the early PowerPC

meetings that Apple had been holding for their key developers, I got pretty much

everything I needed from Apple.

So, I now had a contract to do the most interesting work I could imagine; all I had

to do was figure out where to start.

Dave: What was it like working with Andreas’ compiler?

John: Andreas’ compiler was pretty traditional in its organization. The

front-end made a single pass over the source code, performing lexical analysis as

it went, and generated an intermediate representation (IR) that consisted of

expression trees, labels, and branches. It took about a week to totally remove the

68K code generator from the rest of the compiler, and put in stub routines where

the front-end and the back-end connected so that everything would still link. If I

could fill in all the stub routines in exactly the right way, we’d have a PowerPC

compiler.

The first thing I did was write a routine that dumped the IR in human-readable

form - I don’t know how Andreas got his 68K code generator to work without

that, I guess he can keep more in his head than I can. Looking at the expression

trees on the screen allowed me to visualize how the code generator would proceed.

Most CISC compilers spend a lot of time working on the IR trees themselves.

Traditional global optimizations like loop-invariant code motion or common

subexpression elimination are performed by rewriting the IR trees into

optimized IR trees. The code generator gathers information about the shape of the

trees, deciding how many registers will be needed, which addressing modes will

be used, etc. After instructions are generated they are largely ignored except for

small “peephole” optimizations. (A notable exception to this is the gcc compiler,

which transforms the expressions into a simple algebraic representation called

RTL and uses repeated “peephole” optimizations derived from a machine

description to coalesce these RTL expressions into complex instructions and

addressing modes.)

Most of the RISC compilers that I’d read about in the compiler literature used a

different approach: immediately transform the IR trees into a low-level

representation that was similar or identical to the actual RISC instructions of the

target machine, and perform all optimizations at the machine instruction level. I

decided to use this technique in my PowerPC code generator.

Dave: What was your basic approach to code generation?

John: Strange as it may seem, the first part of the code generator I actually

wrote was the instruction scheduler - the phase that reorders instructions to

minimize latencies caused by load delays, and to permit floating-point and

integer instructions to execute in parallel. I needed to know if my low-level

representation - I called it a “pcode” (no relation to the UCSD Pascal pcode) -

had enough information for all the phases I would eventually write, and since the

scheduler needed a lot of information, it would serve to prove the design of the

pcode. Of course, I had to rewrite the scheduler twice more: the first time was to

fix the original one, which had some design flaws, and the second time was to

make it more general to support 601, 603 and 604 CPUs.

Once I finished the scheduler, I had my data structures organized and all the

support routines in place, so I started writing the instruction selection phase -

the “guts” of the code generator. This phase visits the IR tree and generates

pcode. It does try to recognize certain tree patterns, like opportunities for

FMADD and FMSUB routines, but since there are no complex addressing modes

and very few complex instructions, it is mostly a straightforward translation to

PowerPC instructions.

To get short-term results, I wrote a quick-and-dirty register allocator, and

some code to display the generated pcode instructions, and was able to get most of

the code generation debugged this way. I decided to use a proprietary object code

format, derived from the one we were already using in our 68K compiler and

linker, since I could get this working faster than trying to write an XCOFF linker.

I spent a few weeks getting the linker working, finished the part of the code

generator that wrote the object file, and I could actually compile and link small

programs.

Dave: How did the debugger fit into all this?

John: Around August ’93 the project was falling into place, but we still didn’t

have a source-level debugger. In a most serendipitous event, Dan Podwall, a

friend of mine from Symantec, called and asked whether there were any

opportunities at Metrowerks. Greg Galanos called him right away, hired him on

the phone, and 4 weeks later he had written a debugger - in PowerPlant, no less

- that could single-step and set breakpoints. This would be the first commercial

PowerPlant program - in fact, the first PowerPlant program of any kind aside

from Greg Dow’s demos.

Dave: How did you build the compiler?

John: By September ’93 we had some prototype PowerPC hardware, and I had a

working code generator and linker which ran on the 68K Macintosh and generated

PEF executables that ran on the prototypes. Since this compiler was already

built using our own 68K compiler, it was pretty easy to rehost it on the Power

Macintosh: we made the changes for the Universal Headers and routine

descriptors and such, then compiled it with itself on the 68K machine, which

gave us (after some debugging!) a working PowerPC-hosted PowerPC compiler.

With a little bit of trickery, mandated by differences between PowerPC

floating-point hardware and the 68K SANE software floating-point architecture,

we were able to rehost the 68K compiler on the Power Macintosh as well. We

now had the fastest compilers on the Macintosh.

Dave: What next?

John: I still had a lot of work to do on the PowerPC code generator. The biggest

task was to replace the quick-and-dirty register allocator with a graph

coloring-based allocator. This is one of the great algorithms in the history of

compilers. For years people had been trying to come up with an accurate way to

represent the lifetimes of variables, so that variables or temporaries that did not

overlap could share a register. A lot of ad hoc techniques were developed, but this

guy from IBM Watson Research Center named Greg Chaitin discovered a formal

approach that solved the problem better than anything that had been previously

attempted: build an “interference graph” which has an edge between any two

variables whose values may be live at the same time, and then try to color this

graph with N colors where N is the number of available registers.

So my code generator assumes it has an infinite number of “virtual” registers,

and generates the most efficient code it can under that assumption; for example,

it assumes that all local variables, arguments, and TOC pointers can be assigned

to a register. After the code is all generated, the register allocator tries to

rewrite the virtual registers using real PowerPC registers, and generates extra

code to “spill” values that couldn’t get a real register. In most cases, everything

gets a register since there are so many on the PowerPC. The smarter register

allocator probably makes the overall largest contribution to code quality.

The algorithm has one drawback: it has O(N^2) complexity. There are actually

programs which have so many intermediate expressions that the interference

graph gets too large and it takes several minutes to color it. So I had to keep

around the quick-and-dirty allocator as well, which is why you’ll sometimes get

an annoying message that says the code generator ran out of registers if you’re

compiling without global optimizations.

Dave: And so, CodeWarrior was born!

John: By December I had pretty much everything working. After a last-minute

dash to get C++ language support working on the PowerPC, we were able to burn

our first public release, DR/1, starting a long Metrowerks tradition of getting

things in under the wire and never missing a ship date. We introduced the

product at the San Francisco Macworld Expo with our huge 8-page MacWeek

advertisement, and CodeWarrior™ was born.

There were plenty of things to be cleaned up between DR/1 and DR/3, which was

our real “1.0” release. But by shipping DR/1 and DR/2 when we did, and by

working closely with a lot of the major Macintosh software vendors, we were

able to help a lot of companies get their software ported to the PowerMac that

otherwise might not have.

For me, I had accomplished what I had wanted to when I was back at Symantec:

building the PowerPC compiler that most users would use to port their code to

the new Power Macintoshes. And Greg had kept his promise and given me my

dream job.

Referenced by (2):