All Databases MacTech Vol 03-1987

Benchmarks 2

Volume Number: 3

Issue Number: 9

Column Tag: Mac Cad

Benchmarks Re-visited

By Paul Zarchan, Cambridge, Mass

With the emergence of the Mac 2 and the growing base of useful, easy to use

scientific software, the field of desktop engineering will surely grow this year. The

purpose of this article is to compare, from an engineering user point of view, the new

Mac’s (using a Prodigy 4 as the equivalent of a Mac 2) with their counterparts in the

IBM micro world, DEC mini world and IBM mainframe world. First the issue of

compilation and linking will be addressed and then standardized benchmarks will be

used to compare various machines from both a cost and performance point of view.

Most of the non Mac results were provided to me by A. Tetewsky and D. Feenberg.

These results will soon be published in Ref. 1.

Compiling and Linking

When using a compiled language for programming, such as FORTRAN, the issue of

compile and link times is extremely important. In engineering applications, excessive

compile and link times may make it worthwhile to develop engineering software in an

interpretive language such as BASIC, and then port it to a compiled language after

initial debugging and algorithm development have been completed. If switching

languages may not be practical, it may be worthwhile to stay in FORTRAN but develop

the engineering software on a computer with faster compilation times. After program

development the source code can easily be ported to the computer of interest for final

compilation.

Let’s consider an example in finding complex roots of real polynomials. The

144 lines of program source code for this example can be found in Ref. 2. This

example, like that of the Butterworth example in Ref. 3, uses single precision

arithmetic but unlike the Butterworth example has virtually no input/output code. In

this root finding example, a solution is found for a 30th order, well-behaved

polynomial. The compile and link times for the 144 lines of code, using MS FORTRAN

(both in the Apple and non Apple world), are indicated in Table 1 for a variety of

micros.

In this example, compilation and linking were done using a hard disk for the IBM

AT and Compaq 386, while in the Macintosh world, compilation and linking were done

in RAM. In the IBM world, compiling in RAM is not significantly faster than compiling

from the hard disk. This will always be the case since the operating system software,

DOS, is written for 64k segmented 8086/8088 processors. Although an operating

system which is developed for the 80386 or OS/2 should be better and improve

compilation times, it will not be available for at least one year. If history is any

guide, the wait time may be significantly longer. In addition, due to memory

segmentation and the lack of a FORTRAN editor (a word processor must be used), it

may be difficult to fit all necessary engineering tools into RAM. In the Macintosh

world, memory is linear and easily expandable with third party upgrades. For

example a 512K Mac can be upgraded to 2 Megs for about $500. This permits the

creation of a 1.5 Meg recoverable RAM disk which is large enough to fit FORTRAN and

many other useful tools into RAM. Therefore, compiling in RAM with a Mac is much

faster than compiling from a hard disk.

In addition, in the IBM world one must compile and link before the code can be

executed. The user must nurse the computer through the compiling, linking and

execution process. In the Macintosh world, linking is dynamic and therefore automatic

from a user point of view. The user simply double clicks on “compile and execute” and

the source code compiles, links and runs.

The execution time for this complex root finding example for a variety of micros

appears in Table 2. In this example all the micros with the exception of the Mac Plus

had math coprocessors.

The Table shows that, for this example, the Prodigy 4 is about 10 times faster

than a Mac Plus, more than 5 times faster than an IBM AT and 2.5 times faster than a

Compaq 386. In the IBM world, with the exception of the PC, the math coprocessor

never seems to run at the same clock rate as the CPU. That is why for this example, an

AT and PC (where the math coprocessor is matched to the CPU at 4.77 MHz) have

similar execution times. The Compaq 386 is only twice as fast as the AT even though

the Compaq has 32 bits rather than 16 bits and runs at 16 Mhz rather than 6 Mhz. In

principal, when the IBM operating system software is written and a 16 MHz Intel

80387 math coprocessor becomes available, it should be in the same speed class as the

Prodigy 4. Interestingly enough, the Compaq 386 is rated at 3.5 MIPs while the

Prodigy 4 is only rated at 2.0 MIPs. We can see that in numerical applications, MIP

ratings may not tell the whole story (see Ref. 4 for example).

Often the user may only be interested in the turn around time, which is the sum

of the compile, link and execution times. For this example we can see by comparing

Tables 1 and 2 that the turn around times are significantly better in the Macintosh

world. Table 3 summarizes the results for the complex root example.

The sample problem only had 144 lines of FORTRAN code. If we consider a

“traveling salesman” program using 1500 lines of FORTRAN code, the comparison of

compile and linking times are even more dramatic. Table 4 shows that the Macintosh

and Prodigy 4 are considerably faster for larger programs than either the IBM AT or

Compaq 386.

Whetstone Benchmarking

The Whetstone benchmark, devised in England by Curnow and Wichman in the

Feb. 1976 issue of the Computer Journal, is an attempt to cover a typical mix of all

floating point operations. This benchmark contains linear arrays, and add, subtract,

multiply, divide and transcendental operations. Whetstones were originally written in

ALGOL, but later translated to FORTRAN in 1979 by D. Frank. Since that time, many

computer manufacturers have rated their machines in terms of thousands of

Whetstones per second or kw/sec. Higher Whetstone ratings mean more powerful

machines. Table 5 presents single and double Whetstone ratings for a variety of

micro, mini and mainframe computers. In addition, ratios referenced to Prodigy 4

speed are indicated in the Table. A ratio of 1.7 means that the computer is 1.7 times

faster than the Prodigy 4. All computers, with the exception of the Mac Plus, have

math coprocessors or floating point accelerators. The poor double precision Whetstone

rating of the Mac Plus may, relative to the IBM PC, may be one of the reasons there has

been a scarcity of scientific software for the Mac. Of course, we can see from this

Table that the Prodigy 4 and hence new Mac 2 changes all that.

The Whetstone results of Table 5 (with no I/O) can be compared to the

Butterworth simulation results( with considerable I/O and more representative of a

realistic engineering application) of Ref. 3. Figure 1 shows that all the benchmarks,

whether they be Whetstones or Butterworth simulations, yield about the same relative

machine performance. Only the Mac Plus seems to yields results which are

significantly benchmark dependent. It yields worse performance on the Whetstones

because of it’s lack of a math coprocessor.

Figure 1 - Relative Machine Performance is Approximately Independent of Benchmark

The performance comparison of Fig. 1 can be placed into proper perspective

when the cost of the host computer is considered. For simplicity, computer cost can be

considered to be the machines purchase price only. This neglects the cost of the small

army of technicians required to operate the larger machines and the cost of software

leasing agreements. We can see from Fig. 2 that generally higher cost computers yield

faster performance. However the cost is not always commensurate with the

performance. For example, a VAX 11/780 is only 1.5 times as fast as a Prodigy 4 and

yet is 40 times more expensive. An IBM 3084Q is 11.7 times faster than a Prodigy 4

and is 500 times more expensive. On the micro side an IBM RT is 2.5 times slower

than a Prodigy 4 and yet costs twice as much.

Figure 2 - Micros are More Cost Effective Than Larger Machines

If we normalize the computer performance as measured by double precision

whetstones per second to the computer purchase price we can generate “bang for the

buck” information. More “bang for the buck” means that the computer yields a higher

double precision Whetstone rating for less cost. Figure 3 presents this cost

effectiveness information and shows that the Compaq 386, Prodigy 4 and Micro Vax 2

are very cost effective, with the Prodigy 4 yielding the most “bang for the buck”. The

curve also indicates that if a micro can do the job, it is more cost effective from a

performance point of view than a mainframe.

Figure 3 - Prodigy 4 Outperforms Every Other Computer

Summary

The intent of this article was to show that FORTRAN runs very efficiently on the

Prodigy 4 (and hence Mac 2) when compared to non Apple micros. When compilation

and linking times are taken into account, the comparison is even more dramatic. A

relative performance curve is presented quantifying “bang for the buck” information

for a variety of micros, minis and mainframes. As expected, the new Mac 2 appears to

out- perform every other computer.

Acknowledgements

I wish to thank Micro/Systems, Av Tetewsky and Dan Feenberg for permitting me

to extract from Ref. 1 the benchmark timings on all the non Apple machines and for

providing the technical explanation for the “features” of the various DOS machines.

In addition, I would like to thank Owen Deutsch, for providing me with the “travelling

salesman” FORTRAN code.

References

1) Tetewsky, A. and Feenberg, D. “A Survey of 6 FORTRAN Compilers” to appear in

Sept. 1987 edition of Micro/Systems Journal.

2) Press, N. H. et al, “Numerical Recipes The Art of Scientific Computation”,

Cambridge University Press, 1986.

3) Zarchan, P. “New Mac Workstation Potential”, MacTutor, Vol. 3, No. 3, March

1987, pp 15-21.

4) Boston Computer Society IBM PC Report, “PC Technical Report: MIPs, MFlops,

Benchmarks and Other Half-Truths”, May-June 1987.

5) Marshall, T., Jones, C., and Kluger, S. “Definicon 68020 Coprocessor”, BYTE,

July 1986, pp 120-144.

Referenced by (3):