August 92 - THE NETWORK PROJECT: DISTRIBUTED COMPUTING ON THE MACINTOSH
THE NETWORK PROJECT: DISTRIBUTED COMPUTING ON THE
MACINTOSH
GÜNTHER SAWITZKI
Distributed computing is the wave of the future, soon to come rolling onto the shores
of programming. Programmers should be prepared for the possibilities and challenges
that distributed computing will offer. The NetWork model proposes a design strategy
and provides a testbed implementation that enables you to explore and experiment with
distributed computing on the Macintosh. While this article may not help you write a
better application today, it will help familiarize you with the idea of distributed
computing so that when system support for it comes along, you'll be ready to take
advantage of it.
As computing evolves, we're rapidly moving from a reliance on discrete personal
computers and workstations to a new type of computing infrastructure--acomputing
environment. In a computing environment, applications will make massive use of
many partially coordinated or uncoordinated autonomous computing devices. That is,
one device won't necessarily know which application subtask any other device is
working on or when and how any other device is completing its particular subtask.
These autonomous devices will be connected by multiple threads of communication.
What's more, the computing environment of tomorrow will be continually changing,
with portable devices moving in and out and with new capabilities added dynamically.
Devices will change in time and will have varying availability. In short, distributed
computing in an environment with no guaranteed stability will become the order of the
day.
Visions like Apple's Personal Digital Assistant and the TRON Project give some idea of
what we'll see. The Personal Digital Assistant will be a small intelligent device that
will help you with some aspect of living and working; for example, it might be a smart
map leading you around in a town you're visiting, or a dietary assistant helping you
plan a week's meals, or a TV viewer helping you trace back a thread of interesting
news you've just become aware of. TRON will work the other way, making your
environment smart on its own; for example, the washing machine itself will place
orders for more detergent and will tell the warm water supply to diminish for a
moment because there will be hot wastewater that will feed a heat exchanger. Both
these visions will soon become reality in a distributed computing environment. What
distributed computing will mean for users is that they'll have access to the
considerable computing power that's typically left unused in today's computing setup.
Implementing a system for distributed computing is easy if you reduce or restrict the
availability of personal workstations to their users. The challenge addressed by the
NetWork Project is to make access to idle workstations possible while still
guarantee-ing users immediate access to their personal workstations. NetWork is a
minimal communication and management model designed to operate in this
environment. By handling communication and managing computing resources, it frees
the programmer to think about how to split up a task so that it can be done by multiple
workstations working on small pieces in an uncoordinated and asynchronous way.
NetWork is available on the currentDeveloper CD Series disc and via Internet for those
who want to try it out. This article describes the NetWork Project itself, considers the
types of applications that are most amenable to a distributed computing approach,
thoroughly examines the NetWork model, and then suggests how to implement a
NetWork program on the Macintosh. Because I'm a statistician I've included some
discussion of statistical underpinnings. I've presented this discussion separately,
though, so that if you don't find mathematics fascinating, you can skip it.
HISTORY OF THE NETWORK PROJECT
NetWork is a project of StatLab, the statistical laboratory at the University of
Heidelberg. StatLab was founded in 1984 to complement the existing mathematical
statistics research group by studying practical applications of advanced statistical
methods. We took a look at what was available as the hardware base for our work and
chose the Macintosh, but since no Macintosh was on the German market at that time,
we bought a Lisa. We've been developing our statistical software on Lisa and Macintosh
ever since. This eventually brought us into contact with Larry Taylor, representing
Apple's Advanced Technology Group in Europe.
During a November 1988 meeting, we discussed future perspectives in computing
with Larry. We tried to identify current gaps and obvious next steps. One thing we
could point to was the discrepancy between the amount of computing power we had
installed and the return it gave us. At that time, we were running an installation of
Macintosh Plus and Macintosh II computers, and the usual turnaround time for a
statistical simulation was one night. This was better than the turnaround time for the
same job on the IBM mainframe time-sharing system (about a week), but still it was
frustrating to have to wait so long while other computer resources lay idle. Just the
same, given the Macintosh's character as an absolutely devoted servant of one master,
how in the world could we find a way to share its computing power while still
guaranteeing reliable and efficient service for the Macintosh owner?
In December 1988 we had a visit from Bill Eddy, then head of the statistics
department at Carnegie Mellon University. In a lecture he mentioned that the CMU
people were annoyed at the discrepancy between installed computing power and the
return it gave them and were doing research on executing iterations asynchronously
(in an uncontrolled way) to make use of aggregated computing power. Until then, I'd
been thinking of the solution only in terms of distributed computing in acontrolled
environment. Bill emphasized that in the computing environment of the future,
computing time per se won't be expensive. In fact, in a network consisting of thousands
of CPUs, computing power will befree --if you can access it. This started me thinking
about how we could possibly make a distributed system work under these
circumstances--that is, in a large heterogeneous environment.
When we next met with Larry Taylor in February 1989, I claimed that we could build
a system for distributed computing based on the Macintosh philosophy of the absolute
priority of the user and at the same time able to cope with a large environment. Larry
agreed to support the project, and we formed a team consisting initially of Larry, me,
Reimer KÜhn and Leo van Hemmen of the Heidelberg Neural Network Research Group,
and Joachim Lindenberg, then a computer science student at Karlsruhe University.
The project started in May 1989. We called it the NetWork Project, a reference to the
fact that in the future the only measure of performance that will matter will be thenet
work done per unit of time , not cumulative computing time or other measures of
resource utilization. We gave ourselves six months to decide on the specifications and
build a working prototype of a distributed system that would fit a Macintosh
environment and be scalable up to some thousands of CPUs. AlthoughMacintosh was the
original development target, we did make sure that the system would run in any other
decent environment (DEC TM, UNIX®, what have you). We finished our first release
one week late in November 1989. As they say, the rest is history.
Worth mentioning is the fact that with NetWork's accelerated development schedule,
we didn't spend a lot of time on planning and administration. That's the nature of
progress sometimes. Fortunately, Apple's Advanced Technology External Research
Group had resources available to allocate to the project on the spot. Without this kind
of flexible support, the NetWork Project could not have succeeded.
CANDIDATES FOR DISTRIBUTED COMPUTING
Distributed computing will be a great boon to applications where computing power is
critical and where the computing task can be split into discrete subtasks. Such
applications include the following:
• compiling a new product using a superoptimizing compiler
• solving an optimization problem like placing chips on a board
• generating computer graphics, especially ray tracing
• performing optical character recognition
In these cases, processing may take too long on one particular machine, but if the
application can tap into the computing power available by sending out subtasks, the
processing can be completed in a much more timely manner.
Many applications that involve working on large data sets can benefit from additional
computing power, even in an environment where completion of a subtask is not
guaranteed. Such tasks include sorting with some appropriate merge/sort algorithm:
the global sort can benefit if a subset has already been sorted by another machine but
need not be affected if the result of the presorting is not available. The same applies to
searching and practically all major accounting tasks. Any statistical analysis based on
exponential families, like normal (Gaussian) distributions, can also benefit from
distributed computing: in these analyses you can calculate global sufficient statistics
from those of partial data sets, if available. Problems of this type are completely
splittable into subtasks and clearly are fine candidates for distributed computing.
But what about problems that have a stronger internal structure than those that are
completely splittable? What about iterative and recursive problems, or problems that
lead to pipeline processing or networks of data flow? We can't automatically assume
that these can take advantage of additional computing power in a distributed
environment where the completion of a subtask isn't guaranteed. Still, mathematical
theory can help us identify problems of this type that are good candidates for
distributed computing.
A SPECIAL CLASS: ASYNCHRONOUS ITERATIONSAs an example of problems with
a stronger internal structure than those that are completely splittable, we'll focus on
iterative algorithms. The trouble with running an iterative algorithm in a
nonguaranteed distributed environment is this: the outcome of iterations in one part of
the problem might critically depend on results from iterations in other parts, and the
result of a previous iteration may or may not be available for the next round. Even if
the original iteration converges to a correct result, we don't know whether the same
will hold true if the iterations are done asynchronously.
Suppose, for instance, we have a mapping to be iterated that operates on some
high-dimensional vector or matrix. To prepare for a distributed version, we restrict
the mapping to a subset by providing the full input but allowing the mapping to operate