All Databases MacTech Vol 09-1993

NSort

Volume Number: 9

Issue Number: 5

Column Tag: Visual programming

NSort

An application of parallelism to increase algorithm speed

By Mark Kauffman, Chico, California

The Future of Programming

I'll never forget my first Christmas after I started back to school as a Computer

Science major. I was visiting my in-laws. They had the Orkin Man over to spray for

termites. He and I started talking and I told him that I was working on my Bachelors

degree in Computer Science.

"Oh, really," he said, "I got my degree in Computer Science in 1962. I used to

program for IBM using punched cards.

I was shocked that anyone one with a Computer Science degree would become the

"Orkin Man" and swore that it would never happen to me. Things change fast in the

technical world. You have to keep your eyes open and stay on your feet if you don't

want to be left in the dust. If you are a programmer, make sure that you don't become

the "Orkin Man" by staying on top of the latest advances in programming.

Look into the future and imagine what kind of computers we are going to be

programming in the year 2000. We are approaching the physical limits of how small

and fast we can build a single processor. How can we build faster machines when we

reach the speed limits, imposed by physics, of our current hardware? One method of

building a computer system that is faster than a system with an individual processor is

to build a system with multiple processors and run them in parallel. Parallel

processing systems are not wishful thinking. In their book Highly Parallel Computing,

Almasi and Gottlieb list over thirty existing different parallel processing

architectures (2). Are you ready to start programming these parallel systems? How

many languages do you know that can take advantage of a system with more than one

processor? We have languages that do just that, available on the Macintosh. One of

those languages is Occam, a textual non-object-oriented language that requires the

programmer to specify which functions may operate con currently. Lisp and C have

been modified to design parallel processing algorithms using the Paralation Model

(Snively 72). Another language, based on the dataflow model of computing and

available on the Macintosh, is TGS Systems' Prograph.

Dataflow Architectures & Dataflow Languages

Most of the articles, and even the advertising, I've read about Prograph

emphasize that it is a graphical object-oriented language. I was surprised to find that

Prograph is also a dataflow language. It is based on a parallel-processor dataflow

architecture. Prograph, as a dataflow language, differs from languages like C and

Pascal in that it supports concurrent instruction execution at the single program

instruction level. Pascal, C, LISP, FORTRAN, BASIC, and other common computer

languages are based on the von Neumann architecture: sequential instruction execution

modifying the data in memory. Only one instruction can happen at a time.

In the dataflow model of computing, data flows into and out of instructions

through input/output terminals. Each instruction may have one or more input and

output terminals. An instruction executes whenever all of the data required at its

input terminal(s) are available. After execution, the instruction places the result(s)

on its output terminal(s). Every instruction with an available processor and data at

its inputs will execute con currently in a dataflow system. Parallelism is implicit in

the design of the program. No extra record keeping is required by the programmer.

Prograph programs are graphic illustrations of algorithms using the dataflow

computing model. Though not yet running on a parallel-processor platform, Prograph

is positioned to take advantage of such a system. For the rest of this article let's look at

a common computing problem that can be solved much more quickly with an algorithm

designed to make use of a parallel-processor architecture, sorting.

Analyzing Algorithm Execution Time

First, let's examine how to determine how fast an algorithm is. Then we will be

able to compare NSort, a parallel-processor based sorting algorithm, with the sort

algorithms in common use today. To determine how fast an algorithm is, we first find

the critical steps in the algorithm. These are steps which are repeated a number of

times depending on the number of data to be dealt with, N. Then we count how many

times the critical steps must be performed for the algorithm to complete. For

example, if our sort algorithm had to compare each number to be sorted with every

other number to be sorted, the comparisons would be the critical steps. The number of

times the algorithm would have to make these comparisons to complete the sort would

be N * N. We would say that our sort ran in time N squared. Increasing the number of

data to be sorted from 10 to 1000, a factor of 100, would increase the sort time by a

factor of 10000. You've heard of quicksort? What's great about quicksort is that on

the average it will sort a large list in time N * Log(base 2)N. With quicksort then,

increasing the number of data from 8 to 1024, a factor of 128, would, on the average,

increase the sort time by a factor of under 2000. All other sorts that are based on

sequential processing also sort in some time which is a multiple of N. In his text,

Programming With Data Structures 2, Robert L. Kruse proves that "Any algorithm

that sorts a list by comparing keys must, in it's worst case on a list of length N, do at

least. . . N*lg(N) + O(N) comparisons of keys." In the next section I will describe an

algorithm, NSort, that will consistently sort in time N.

The NSort Algorithm

We humans have the capability to make more than one comparison at a time, so I

felt that a reasonable way to develop a sort algorithm that uses parallelism would be to

observe human parallelism at work. To develop NSort, I wrote a list of numbers on a

dry-erase board and sorted them as quickly as I could. As I sorted, I watched what I

did. Here is the list and the procedure I took:

Unsorted List

(7 2 3 9 5 4 6)

Take the first element and make a new list. Now you have two lists: the sorted

list, and the unsorted list:

Unsorted List Sorted List

(2 3 9 5 4 6) (7)

Take the first element from the unsorted list again. Where does it belong in the

sorted list? Put it there.

Unsorted List Sorted List

(3 9 5 4 6) (2 7)

Keep taking the first element from the unsorted list and placing it where it

belongs in the sorted list, until the unsorted list is empty.

Unsorted List Sorted List

(9 5 4 6) (2 3 7)

(5 4 6) (2 3 7 9)

(4 6) (2 3 5 7 9)

(6) (2 3 4 5 7 9)

() (2 3 4 5 6 7 9)

We've sorted the list in the same number of steps as there are elements to be

sorted, time N. The critical step, that you were able to perform as a parallel

operation, was "Where does it (the number detached from the unsorted list) belong in

the sorted list?" A computer system with processors running in parallel could do the

same thing. You would pass the processors a sorted list of numbers, one list element to

each processor. Also pass each processor the single number from the unsorted list to

compare with the sorted list. Ask each processor if its list element is less than the

number. The collection of processors would then return a list of booleans. Where the

booleans changed from TRUE to FALSE is where the number being compared to the

elements in the list should be inserted in the sorted list. Let me use the numbers from

the above example to demonstrate how this works:

Unsorted List Sorted List

(6) (2 3 4 5 7 9)

You want the processor array to find out where the six belongs in the sorted list.

Pass the sorted list to the processor array. Ask the processors in the array if their

list element is less than six. The processor array returns the following list:

(TRUE TRUE TRUE TRUE FALSE FALSE)

Now to figure out which element to insert the six in front of, pass this boolean

array to the parallel-processor array. Ask, "Who has a FALSE?" The processor array

returns the list of processor numbers that are holding FALSE:

(5 6)

From the list of processor numbers returned, only look at the value of the lowest

numbered processor. In this example the processor array finds that the six needs to be

inserted before the fifth element on the list.

There is one special case to consider. What happens if the number from the

unsorted list is bigger than all of the numbers in the sorted list? Say we had used a ten

instead of a six in this example. Then the boolean array created by the compare

operation would have looked like this:

(TRUE TRUE TRUE TRUE TRUE TRUE)

When you ask the processor array who is holding a false, an empty list is

returned. Then you know that the ten has to be inserted after the last item on the

sorted list.

Implementation of NSort

Following are the seven prographs (TGS Systems' term for algorithms written in

Prograph) of the NSort implementation. For those unfamiliar with Prograph syntax,

there is one function per window. Function names are given in the title bars of the

windows. As we look at each function, I'll explain its purpose and operation.

First, let's examine the main NSort routine.

This function asks the user to enter a list then tests the input to see that it is a

list. If the input is a list it sorts the list and shows the results. Otherwise, the

function terminates without showing any results.

The called functions, ask and show are Prograph primitives. They are built into

the language. Ask brings up a dialog box requesting input and show brings up a dialog

box that displays output. Both primitives' dialog boxes have OK buttons for the user to

press. The called function test is not a primitive but is a function supplied by TGS

with their Algorithm examples. It checks the data on its input terminal to see that it is

a list and that all of the elements of the list are of the same type. Notice the X in a box

with the bar above it next to test. This is the Prograph control "Terminate on

Failure." If test fails, this control terminates the operation of the function NSort.

When test succeeds, nSort sorts the list on its input terminal then passes the result to

show.

Now, let's look at nSort. Notice that it doesn't actually do all of the sorting but

sets up the sort and calls sort-em to complete the sort.

The nSort method consists of two cases. Case 1:2 (read 1 of 2) performs a

Prograph match operation on the input list. The match operation is the line above the

set of parenthesis and box with an X inside. This match operation is checking for an

empty list, (). If the list is not empty, the match fails and the next case is called. An

empty list is not sorted.

Case 2:2 performs the same first step we took when sorting. It takes the left

most element off the unsorted list and creates a one element 'sorted' list. These two

lists are passed to sort-em which takes items from the unsorted list and inserts them

in the right place in the sorted list until the unsorted list is empty. The looping

arrows show that both lists are being feed from the output terminals of sort-em back

into its input terminals. The looping arrows also represent repeated operations on

data by a method.

The first thing sort-em does is another match on the input list. If the match

succeeds, the list is empty and the looping around sort-em terminates. When the

unsorted list is not empty, sort-em detaches the left most item from the list, and

passes the item and the sorted list to find insert location. After determining where to

insert the item, it passes the item and the sorted list to the method insert into sorted

list to be, you guessed it, inserted into the sorted list.

Now, let's review the operation of find insert location.

This method, find insert location, figures out where to insert the item in the

sorted list. It returns a 0 if the item is to be inserted to the far right of the sorted list.

Otherwise it returns the location where the item is to be inserted.

This operation is the key method to sorting in time N. If Prograph were running

on a parallel processor system and there were enough processors to perform the list

operations simultaneously, NSort would complete in time N.

The only operations left to consider are the two cases of insert into sorted list.

If the insert location is zero, case 1:2 places the item to insert on the far right of

the sorted list. If the insert location is not zero, case 1:2 switches to case 2:2. Case

2:2 splits the sorted list in two at the place where the item is to be inserted, converts

the item to insert into a list, and then joins all three lists to create the new sorted list.

This completes the implementation of NSort with Prograph.

The Challenge

Our challenge as programmers is to learn what processes will benifit from

parallelism and how to implement them. Searching and finding the shortest path look

like good candidates. Neural network design, another hot topic in computer science

today, is another. Using a graphical, parallel language like Prograph should simplifly

network design. You would define a single class neuron. You could draw the network

topology using Prograph's existing graphical capabilities.

There is a challenge for you hardware gurus too. Build (inexpensive please)

NuBus boards with parallel processing capability. It would be nice if they would just

plug in and work with a graphical object-oriented language. I've seen that Occam is

available on the Macintosh with parallel processing hardware. (Occam is a

non-object-oriented parallel-processing textual language that requires the

programmer to explicitly state which functions can operate in parallel.) What would it

take to convert that hardware for use with Prograph?

If you implement one of these ideas or if you think of others, I'd like to hear about

it. Send me e-mail at mkaufman@hairball.ecst.csuchico.edu or write to me at: 254 E.

7th Ave. Chico, CA 95926. Better yet, write an article for MacTech.

References

Almasi, George and Gottlieb, Allan. Highly Parallel Computing. Redwood City:

Benjamin/Cummings Publishing Company, Inc., 1989.

Snively, Paul. "The Paralation Model." MacTutor. Nov./Dec. 1992: 72-79.

Referenced by (3):