NSort
Volume Number: 9
Issue Number: 5
Column Tag: Visual programming
NSort
An application of parallelism to increase algorithm speed
By Mark Kauffman, Chico, California
The Future of Programming
I'll never forget my first Christmas after I started back to school as a Computer
Science major. I was visiting my in-laws. They had the Orkin Man over to spray for
termites. He and I started talking and I told him that I was working on my Bachelors
degree in Computer Science.
"Oh, really," he said, "I got my degree in Computer Science in 1962. I used to
program for IBM using punched cards.
I was shocked that anyone one with a Computer Science degree would become the
"Orkin Man" and swore that it would never happen to me. Things change fast in the
technical world. You have to keep your eyes open and stay on your feet if you don't
want to be left in the dust. If you are a programmer, make sure that you don't become
the "Orkin Man" by staying on top of the latest advances in programming.
Look into the future and imagine what kind of computers we are going to be
programming in the year 2000. We are approaching the physical limits of how small
and fast we can build a single processor. How can we build faster machines when we
reach the speed limits, imposed by physics, of our current hardware? One method of
building a computer system that is faster than a system with an individual processor is
to build a system with multiple processors and run them in parallel. Parallel
processing systems are not wishful thinking. In their book Highly Parallel Computing,
Almasi and Gottlieb list over thirty existing different parallel processing
architectures (2). Are you ready to start programming these parallel systems? How
many languages do you know that can take advantage of a system with more than one
processor? We have languages that do just that, available on the Macintosh. One of
those languages is Occam, a textual non-object-oriented language that requires the
programmer to specify which functions may operate con currently. Lisp and C have
been modified to design parallel processing algorithms using the Paralation Model
(Snively 72). Another language, based on the dataflow model of computing and
available on the Macintosh, is TGS Systems' Prograph.
Dataflow Architectures & Dataflow Languages
Most of the articles, and even the advertising, I've read about Prograph
emphasize that it is a graphical object-oriented language. I was surprised to find that
Prograph is also a dataflow language. It is based on a parallel-processor dataflow
architecture. Prograph, as a dataflow language, differs from languages like C and
Pascal in that it supports concurrent instruction execution at the single program
instruction level. Pascal, C, LISP, FORTRAN, BASIC, and other common computer
languages are based on the von Neumann architecture: sequential instruction execution
modifying the data in memory. Only one instruction can happen at a time.
In the dataflow model of computing, data flows into and out of instructions
through input/output terminals. Each instruction may have one or more input and
output terminals. An instruction executes whenever all of the data required at its
input terminal(s) are available. After execution, the instruction places the result(s)
on its output terminal(s). Every instruction with an available processor and data at
its inputs will execute con currently in a dataflow system. Parallelism is implicit in
the design of the program. No extra record keeping is required by the programmer.
Prograph programs are graphic illustrations of algorithms using the dataflow
computing model. Though not yet running on a parallel-processor platform, Prograph
is positioned to take advantage of such a system. For the rest of this article let's look at
a common computing problem that can be solved much more quickly with an algorithm
designed to make use of a parallel-processor architecture, sorting.
Analyzing Algorithm Execution Time
First, let's examine how to determine how fast an algorithm is. Then we will be
able to compare NSort, a parallel-processor based sorting algorithm, with the sort
algorithms in common use today. To determine how fast an algorithm is, we first find
the critical steps in the algorithm. These are steps which are repeated a number of
times depending on the number of data to be dealt with, N. Then we count how many
times the critical steps must be performed for the algorithm to complete. For
example, if our sort algorithm had to compare each number to be sorted with every
other number to be sorted, the comparisons would be the critical steps. The number of
times the algorithm would have to make these comparisons to complete the sort would
be N * N. We would say that our sort ran in time N squared. Increasing the number of
data to be sorted from 10 to 1000, a factor of 100, would increase the sort time by a
factor of 10000. You've heard of quicksort? What's great about quicksort is that on
the average it will sort a large list in time N * Log(base 2)N. With quicksort then,
increasing the number of data from 8 to 1024, a factor of 128, would, on the average,
increase the sort time by a factor of under 2000. All other sorts that are based on
sequential processing also sort in some time which is a multiple of N. In his text,
Programming With Data Structures 2, Robert L. Kruse proves that "Any algorithm
that sorts a list by comparing keys must, in it's worst case on a list of length N, do at
least. . . N*lg(N) + O(N) comparisons of keys." In the next section I will describe an
algorithm, NSort, that will consistently sort in time N.
The NSort Algorithm
We humans have the capability to make more than one comparison at a time, so I
felt that a reasonable way to develop a sort algorithm that uses parallelism would be to
observe human parallelism at work. To develop NSort, I wrote a list of numbers on a
dry-erase board and sorted them as quickly as I could. As I sorted, I watched what I
did. Here is the list and the procedure I took:
Unsorted List
(7 2 3 9 5 4 6)
Take the first element and make a new list. Now you have two lists: the sorted
list, and the unsorted list:
Unsorted List Sorted List
(2 3 9 5 4 6) (7)
Take the first element from the unsorted list again. Where does it belong in the
sorted list? Put it there.
Unsorted List Sorted List
(3 9 5 4 6) (2 7)
Keep taking the first element from the unsorted list and placing it where it
belongs in the sorted list, until the unsorted list is empty.
Unsorted List Sorted List
(9 5 4 6) (2 3 7)
(5 4 6) (2 3 7 9)
(4 6) (2 3 5 7 9)
(6) (2 3 4 5 7 9)
() (2 3 4 5 6 7 9)
We've sorted the list in the same number of steps as there are elements to be
sorted, time N. The critical step, that you were able to perform as a parallel
operation, was "Where does it (the number detached from the unsorted list) belong in
the sorted list?" A computer system with processors running in parallel could do the
same thing. You would pass the processors a sorted list of numbers, one list element to
each processor. Also pass each processor the single number from the unsorted list to
compare with the sorted list. Ask each processor if its list element is less than the
number. The collection of processors would then return a list of booleans. Where the
booleans changed from TRUE to FALSE is where the number being compared to the
elements in the list should be inserted in the sorted list. Let me use the numbers from
the above example to demonstrate how this works:
Unsorted List Sorted List
(6) (2 3 4 5 7 9)
You want the processor array to find out where the six belongs in the sorted list.
Pass the sorted list to the processor array. Ask the processors in the array if their
list element is less than six. The processor array returns the following list:
(TRUE TRUE TRUE TRUE FALSE FALSE)
Now to figure out which element to insert the six in front of, pass this boolean
array to the parallel-processor array. Ask, "Who has a FALSE?" The processor array
returns the list of processor numbers that are holding FALSE:
(5 6)
From the list of processor numbers returned, only look at the value of the lowest
numbered processor. In this example the processor array finds that the six needs to be
inserted before the fifth element on the list.
There is one special case to consider. What happens if the number from the
unsorted list is bigger than all of the numbers in the sorted list? Say we had used a ten
instead of a six in this example. Then the boolean array created by the compare
operation would have looked like this:
(TRUE TRUE TRUE TRUE TRUE TRUE)
When you ask the processor array who is holding a false, an empty list is
returned. Then you know that the ten has to be inserted after the last item on the
sorted list.
Implementation of NSort
Following are the seven prographs (TGS Systems' term for algorithms written in
Prograph) of the NSort implementation. For those unfamiliar with Prograph syntax,
there is one function per window. Function names are given in the title bars of the
windows. As we look at each function, I'll explain its purpose and operation.
First, let's examine the main NSort routine.
This function asks the user to enter a list then tests the input to see that it is a
list. If the input is a list it sorts the list and shows the results. Otherwise, the
function terminates without showing any results.
The called functions, ask and show are Prograph primitives. They are built into
the language. Ask brings up a dialog box requesting input and show brings up a dialog
box that displays output. Both primitives' dialog boxes have OK buttons for the user to
press. The called function test is not a primitive but is a function supplied by TGS
with their Algorithm examples. It checks the data on its input terminal to see that it is
a list and that all of the elements of the list are of the same type. Notice the X in a box
with the bar above it next to test. This is the Prograph control "Terminate on
Failure." If test fails, this control terminates the operation of the function NSort.
When test succeeds, nSort sorts the list on its input terminal then passes the result to
show.
Now, let's look at nSort. Notice that it doesn't actually do all of the sorting but
sets up the sort and calls sort-em to complete the sort.
The nSort method consists of two cases. Case 1:2 (read 1 of 2) performs a
Prograph match operation on the input list. The match operation is the line above the
set of parenthesis and box with an X inside. This match operation is checking for an
empty list, (). If the list is not empty, the match fails and the next case is called. An
empty list is not sorted.
Case 2:2 performs the same first step we took when sorting. It takes the left
most element off the unsorted list and creates a one element 'sorted' list. These two
lists are passed to sort-em which takes items from the unsorted list and inserts them
in the right place in the sorted list until the unsorted list is empty. The looping
arrows show that both lists are being feed from the output terminals of sort-em back
into its input terminals. The looping arrows also represent repeated operations on
data by a method.
The first thing sort-em does is another match on the input list. If the match
succeeds, the list is empty and the looping around sort-em terminates. When the
unsorted list is not empty, sort-em detaches the left most item from the list, and
passes the item and the sorted list to find insert location. After determining where to
insert the item, it passes the item and the sorted list to the method insert into sorted
list to be, you guessed it, inserted into the sorted list.
Now, let's review the operation of find insert location.
This method, find insert location, figures out where to insert the item in the
sorted list. It returns a 0 if the item is to be inserted to the far right of the sorted list.
Otherwise it returns the location where the item is to be inserted.
This operation is the key method to sorting in time N. If Prograph were running
on a parallel processor system and there were enough processors to perform the list
operations simultaneously, NSort would complete in time N.
The only operations left to consider are the two cases of insert into sorted list.
If the insert location is zero, case 1:2 places the item to insert on the far right of
the sorted list. If the insert location is not zero, case 1:2 switches to case 2:2. Case
2:2 splits the sorted list in two at the place where the item is to be inserted, converts
the item to insert into a list, and then joins all three lists to create the new sorted list.
This completes the implementation of NSort with Prograph.
The Challenge
Our challenge as programmers is to learn what processes will benifit from
parallelism and how to implement them. Searching and finding the shortest path look
like good candidates. Neural network design, another hot topic in computer science
today, is another. Using a graphical, parallel language like Prograph should simplifly
network design. You would define a single class neuron. You could draw the network
topology using Prograph's existing graphical capabilities.
There is a challenge for you hardware gurus too. Build (inexpensive please)
NuBus boards with parallel processing capability. It would be nice if they would just
plug in and work with a graphical object-oriented language. I've seen that Occam is
available on the Macintosh with parallel processing hardware. (Occam is a
non-object-oriented parallel-processing textual language that requires the
programmer to explicitly state which functions can operate in parallel.) What would it
take to convert that hardware for use with Prograph?
If you implement one of these ideas or if you think of others, I'd like to hear about
it. Send me e-mail at mkaufman@hairball.ecst.csuchico.edu or write to me at: 254 E.
7th Ave. Chico, CA 95926. Better yet, write an article for MacTech.
References
Almasi, George and Gottlieb, Allan. Highly Parallel Computing. Redwood City:
Benjamin/Cummings Publishing Company, Inc., 1989.
Snively, Paul. "The Paralation Model." MacTutor. Nov./Dec. 1992: 72-79.