All Databases MacTech Vol 05-1989

Compiler Compare

Volume Number: 5

Issue Number: 8

Column Tag: Jörg's Folder

Compiler Comparison

By Jörg Langowski, MacTutor Editorial Board

Note: Source code files accompanying article are located on MacTech CD-ROM or

source code disks.

Code optimization

You will have noticed the change in the column’s title: a recent reader survey has

shown that Forth and Basic are the two languages that our readers would most like to

see less of in MacTutor. That’s a shame, but we’re responsive (at least we try)

So, from now on my monthly column will have a wider scope. As you might have

seen, I have very often used Forth as a vehicle to explain general concepts of Macintosh

programming. Since many subscribers don’t seem to be happy with Forth - in fact,

people often have asked about things that had been explained in a Forth column, but

they just hadn’t read. The column about the Notification Manager in V5#6 is a good

example: at the bottom of page 42, one of the ‘ideas to be written’ was explained as: “

An example that places a Notification Manager request in low System memory, and

starts a Timer routine”; this is just the example that was in the Forth Forum that

covered the Notification Manager. Anyway, I’ll try to use other vehicles to convey the

message from now on. Such as assembly, or maybe even C. Mach2 Forth still is a very

good assembly-language development system because of its interactivity. Of course,

you cannot create complicated macros, or structures, and have to resort to Forth code

for those purposes. I’ll still inform you about interesting things I come across on the

Forth scene, but this won’t be an exclusive Forth column anymore. Emphasis will be

on two things: basic system-level things such as drivers, trap patches, INITs,

network, new managers as they come up; and - on the other side of the spectrum -

object-oriented programming in C++.

Apple will ‘Real Soon Now’ release a C++ under MPW, and we’ll hopefully have a

pre-release by the time you read this. C++ is a very interesting language, much more

than a simple extension of C; reading Stroustrup’s book I got this feeling of ‘yes, that’s

how one should have done it in the first place’ that I had 15 years ago when all I knew

was Algol 60, and came across the description of Algol 68. Sadly enough, Algol 68

never really caught on; hopefully, C++ will. The C++ column will start with the next

issue; including program examples if we get the pre-release soon enough, just ‘dry

swimming’ if not.

This month, we’ll talk once more about one of my favorite subjects, number

crunching, speed (or the lack of it), and intelligence in compilers (or the lack of it).

A matrix multiplication routine

Since I am doing more and more everyday computation (mostly Fortran) on the

MacII, I’m obviously interested in a good optimizing compiler. Now, a standard trick

that every decent compiler should have in its repertoire is the elimination of constant

expressions from loops, or assignment of array elements that are not dependent on the

loop index to intermediate variables.

Imagine my surprise when I found out that (in Language Systems Fortran 1.2.1)

I could speed up a loop that looked like this:

do k=1,n3

do i=1,n1

c(i,k) = 0x0

do j=1,n2

c(i,k) = c(i,k)+a(i,j)*b(j,k)

end do

by simply writing:

do k=1,n3
do i=1,n1
sum = 0x0
do j=1,n2
sum = sum+a(i,j)*b(j,k)
end do
c(i,k) = sum
end do
end do

Now, in undergraduate programming classes, years ago, we were actually taught

to look for constant arithmetical or index expressions in loops and put them outside if

possible. Today, almost everybody assumes that the compiler is smart enough to take

care of that; incorrectly, as you see. To see how good the compilers available under

MPW can do, I wrote a Fortran program (listing 1) that calls several versions of this

matrix multiplication loop, written in Fortran (Lang. Sys. 1.2.1), Pascal (Apple 3.0

beta), and C (Apple 3.0 beta). Surprise: none of the compilers was good enough to move

the indexing outside of the loop. The following table gives the results (Mac IIx):

Pascal, hand-optimized: 2.7667 seconds

C, register variables, hand-opt.: 3.4667 seconds

Pascal: 4.0333 seconds

Fortran, const. dimensions, opt=3: 4.5000 seconds

Fortran, hand-optimized, opt=3: 4.6500 seconds

C: 4.7167 seconds

Fortran, opt=3: 6.5167 seconds

Fortran, opt=0: 6.6167 seconds

A difference of more than a factor of 2 between the slowest Fortran and the fastest

Pascal code. Apple Pascal lived up to its good reputation here, but even that could be

improved a lot by eliminating the constant index expression.

Surprised, I ran the Fortran benchmark on a Microvax II, and found that even

there some speed could be gained by ‘hand-optimizing’ the code:

Fortran, plain: 3.6333 seconds

Fortran, hand-opt.: 3.2833 seconds

Fortran, const. dimensions: 3.1000 seconds

However, the difference between the machine-optimized and the hand-optimized

version is not quite as big as for the MPW languages (15% for the VAX vs. 27-30%

for MPW). If you compile the VAX code without optimization, you get a bigger

difference (23%):

Fortran, plain: 6.2500 seconds

Fortran, hand-opt.: 4.8833 seconds

Fortran, const. dimensions: 4.8000 seconds

Therefore, take-home lesson one: don’t take compiler optimizations for granted.

The machine code behind it

Benchmarks have been run on lots of different machines, using lots of different

compilers. I was interested in how the code generated by the MPW compilers actually

differed. A job for Nosy, and the results are shown in the last listing. I’ve only printed

the innermost loops. Don’t be overwhelmed by the pile of assembly code, just note

some important details.

First, for the loop optimization examples discussed here, there seems to be no

tradeoff between code length and speed. On the contrary, the fastest code is also the

shortest. On the other hand, there are some obvious pieces of code which are clearly

redundant. The most blatant example is the Fortran-generated code at the end of the

listing, where an index expression is recalculated that was actually in register A1 all

the time! 14 extra lines of machine code on each pass through the loop will add up to

quite some extra time lost. Another point is that Language System obviously has no

great trust in the quality of the 68000/20/30, otherwise how can one explain that

they repeat the EXT.L D2 instruction each time it occurs? To make sure it works at

least once?

Language Systems Fortran makes other funny assumptions about the machine, for

instance it seems to think there are only two floating point registers in the 68881,

FP0 and FP7. I have looked at some code which had great potential for optimization by

using enough FP registers. Language Systems is, however, known for its

responsiveness towards customers, so I hope we won’t have to wait too long until a

well-optimized Fortran shows up.

Both Pascal and C like juggling floating point registers. Why generate (like

Apple’s C):

FMOVE FP7,FP1

FADD FP0,FP1

FMOVE FP1,FP7

when a simple FADD FP0,FP7 would suffice? Eliminates two floating point

instructions per loop. Pascal does

FADD FP7,FP0

FMOVE FP0,FP7

when a simple inversion of the operands

FADD FP0,FP7

would give the same result. One floating point instruction per loop eliminated. The

timing difference between the Pascal and C routines is partly because of the one extra

floating point instruction.

Last remark: I haven’t seen the Absoft MPW 3.0 Fortran yet. If anyone from

Absoft is reading this, I’d like an evaluation copy to run the same analysis (since you

claim in your ads you have such a great optimizer). If I get enough other languages

collected together, we’ll have a follow-up on this article.

Next month

The MacHack is over (thanks, Aimée, Carol, and all the others, for organizing

such a good meeting), and I’ll tell you some of my impressions in the next column.

Otherwise, we’ll start with an introduction to C++; I hope the compiler will arrive

here in time.

Referenced by (3):