Benchmarks 2
Volume Number: 3
Issue Number: 9
Column Tag: Mac Cad
Benchmarks Re-visited
By Paul Zarchan, Cambridge, Mass
With the emergence of the Mac 2 and the growing base of useful, easy to use
scientific software, the field of desktop engineering will surely grow this year. The
purpose of this article is to compare, from an engineering user point of view, the new
Mac’s (using a Prodigy 4 as the equivalent of a Mac 2) with their counterparts in the
IBM micro world, DEC mini world and IBM mainframe world. First the issue of
compilation and linking will be addressed and then standardized benchmarks will be
used to compare various machines from both a cost and performance point of view.
Most of the non Mac results were provided to me by A. Tetewsky and D. Feenberg.
These results will soon be published in Ref. 1.
Compiling and Linking
When using a compiled language for programming, such as FORTRAN, the issue of
compile and link times is extremely important. In engineering applications, excessive
compile and link times may make it worthwhile to develop engineering software in an
interpretive language such as BASIC, and then port it to a compiled language after
initial debugging and algorithm development have been completed. If switching
languages may not be practical, it may be worthwhile to stay in FORTRAN but develop
the engineering software on a computer with faster compilation times. After program
development the source code can easily be ported to the computer of interest for final
compilation.
Let’s consider an example in finding complex roots of real polynomials. The
144 lines of program source code for this example can be found in Ref. 2. This
example, like that of the Butterworth example in Ref. 3, uses single precision
arithmetic but unlike the Butterworth example has virtually no input/output code. In
this root finding example, a solution is found for a 30th order, well-behaved
polynomial. The compile and link times for the 144 lines of code, using MS FORTRAN
(both in the Apple and non Apple world), are indicated in Table 1 for a variety of
micros.
In this example, compilation and linking were done using a hard disk for the IBM
AT and Compaq 386, while in the Macintosh world, compilation and linking were done
in RAM. In the IBM world, compiling in RAM is not significantly faster than compiling
from the hard disk. This will always be the case since the operating system software,
DOS, is written for 64k segmented 8086/8088 processors. Although an operating
system which is developed for the 80386 or OS/2 should be better and improve
compilation times, it will not be available for at least one year. If history is any
guide, the wait time may be significantly longer. In addition, due to memory
segmentation and the lack of a FORTRAN editor (a word processor must be used), it
may be difficult to fit all necessary engineering tools into RAM. In the Macintosh
world, memory is linear and easily expandable with third party upgrades. For
example a 512K Mac can be upgraded to 2 Megs for about $500. This permits the
creation of a 1.5 Meg recoverable RAM disk which is large enough to fit FORTRAN and
many other useful tools into RAM. Therefore, compiling in RAM with a Mac is much
faster than compiling from a hard disk.
In addition, in the IBM world one must compile and link before the code can be
executed. The user must nurse the computer through the compiling, linking and
execution process. In the Macintosh world, linking is dynamic and therefore automatic
from a user point of view. The user simply double clicks on “compile and execute” and
the source code compiles, links and runs.
The execution time for this complex root finding example for a variety of micros
appears in Table 2. In this example all the micros with the exception of the Mac Plus
had math coprocessors.
The Table shows that, for this example, the Prodigy 4 is about 10 times faster
than a Mac Plus, more than 5 times faster than an IBM AT and 2.5 times faster than a
Compaq 386. In the IBM world, with the exception of the PC, the math coprocessor
never seems to run at the same clock rate as the CPU. That is why for this example, an
AT and PC (where the math coprocessor is matched to the CPU at 4.77 MHz) have
similar execution times. The Compaq 386 is only twice as fast as the AT even though
the Compaq has 32 bits rather than 16 bits and runs at 16 Mhz rather than 6 Mhz. In
principal, when the IBM operating system software is written and a 16 MHz Intel
80387 math coprocessor becomes available, it should be in the same speed class as the
Prodigy 4. Interestingly enough, the Compaq 386 is rated at 3.5 MIPs while the
Prodigy 4 is only rated at 2.0 MIPs. We can see that in numerical applications, MIP
ratings may not tell the whole story (see Ref. 4 for example).
Often the user may only be interested in the turn around time, which is the sum
of the compile, link and execution times. For this example we can see by comparing
Tables 1 and 2 that the turn around times are significantly better in the Macintosh
world. Table 3 summarizes the results for the complex root example.
The sample problem only had 144 lines of FORTRAN code. If we consider a
“traveling salesman” program using 1500 lines of FORTRAN code, the comparison of
compile and linking times are even more dramatic. Table 4 shows that the Macintosh
and Prodigy 4 are considerably faster for larger programs than either the IBM AT or
Compaq 386.
Whetstone Benchmarking
The Whetstone benchmark, devised in England by Curnow and Wichman in the
Feb. 1976 issue of the Computer Journal, is an attempt to cover a typical mix of all
floating point operations. This benchmark contains linear arrays, and add, subtract,
multiply, divide and transcendental operations. Whetstones were originally written in
ALGOL, but later translated to FORTRAN in 1979 by D. Frank. Since that time, many
computer manufacturers have rated their machines in terms of thousands of
Whetstones per second or kw/sec. Higher Whetstone ratings mean more powerful
machines. Table 5 presents single and double Whetstone ratings for a variety of
micro, mini and mainframe computers. In addition, ratios referenced to Prodigy 4
speed are indicated in the Table. A ratio of 1.7 means that the computer is 1.7 times
faster than the Prodigy 4. All computers, with the exception of the Mac Plus, have
math coprocessors or floating point accelerators. The poor double precision Whetstone
rating of the Mac Plus may, relative to the IBM PC, may be one of the reasons there has
been a scarcity of scientific software for the Mac. Of course, we can see from this
Table that the Prodigy 4 and hence new Mac 2 changes all that.
The Whetstone results of Table 5 (with no I/O) can be compared to the
Butterworth simulation results( with considerable I/O and more representative of a
realistic engineering application) of Ref. 3. Figure 1 shows that all the benchmarks,
whether they be Whetstones or Butterworth simulations, yield about the same relative
machine performance. Only the Mac Plus seems to yields results which are
significantly benchmark dependent. It yields worse performance on the Whetstones
because of it’s lack of a math coprocessor.
Figure 1 - Relative Machine Performance is Approximately Independent of Benchmark
The performance comparison of Fig. 1 can be placed into proper perspective
when the cost of the host computer is considered. For simplicity, computer cost can be
considered to be the machines purchase price only. This neglects the cost of the small
army of technicians required to operate the larger machines and the cost of software
leasing agreements. We can see from Fig. 2 that generally higher cost computers yield
faster performance. However the cost is not always commensurate with the
performance. For example, a VAX 11/780 is only 1.5 times as fast as a Prodigy 4 and
yet is 40 times more expensive. An IBM 3084Q is 11.7 times faster than a Prodigy 4
and is 500 times more expensive. On the micro side an IBM RT is 2.5 times slower
than a Prodigy 4 and yet costs twice as much.
Figure 2 - Micros are More Cost Effective Than Larger Machines
If we normalize the computer performance as measured by double precision
whetstones per second to the computer purchase price we can generate “bang for the
buck” information. More “bang for the buck” means that the computer yields a higher
double precision Whetstone rating for less cost. Figure 3 presents this cost
effectiveness information and shows that the Compaq 386, Prodigy 4 and Micro Vax 2
are very cost effective, with the Prodigy 4 yielding the most “bang for the buck”. The
curve also indicates that if a micro can do the job, it is more cost effective from a
performance point of view than a mainframe.
Figure 3 - Prodigy 4 Outperforms Every Other Computer
Summary
The intent of this article was to show that FORTRAN runs very efficiently on the
Prodigy 4 (and hence Mac 2) when compared to non Apple micros. When compilation
and linking times are taken into account, the comparison is even more dramatic. A
relative performance curve is presented quantifying “bang for the buck” information
for a variety of micros, minis and mainframes. As expected, the new Mac 2 appears to
out- perform every other computer.
Acknowledgements
I wish to thank Micro/Systems, Av Tetewsky and Dan Feenberg for permitting me
to extract from Ref. 1 the benchmark timings on all the non Apple machines and for
providing the technical explanation for the “features” of the various DOS machines.
In addition, I would like to thank Owen Deutsch, for providing me with the “travelling
salesman” FORTRAN code.
References
1) Tetewsky, A. and Feenberg, D. “A Survey of 6 FORTRAN Compilers” to appear in
Sept. 1987 edition of Micro/Systems Journal.
2) Press, N. H. et al, “Numerical Recipes The Art of Scientific Computation”,
Cambridge University Press, 1986.
3) Zarchan, P. “New Mac Workstation Potential”, MacTutor, Vol. 3, No. 3, March
1987, pp 15-21.
4) Boston Computer Society IBM PC Report, “PC Technical Report: MIPs, MFlops,
Benchmarks and Other Half-Truths”, May-June 1987.
5) Marshall, T., Jones, C., and Kluger, S. “Definicon 68020 Coprocessor”, BYTE,
July 1986, pp 120-144.