Jun 96 Challenge
Volume Number: 12
Issue Number: 6
Column Tag: Programmer’s Challenge
Programmer’s Challenge
By Bob Boonstra, Westford, Massachusetts
Note: Source code files accompanying article are located on MacTech CD-ROM or
source code disks.
Postman’s Sort
Everyone knows that sorting algorithms require execution time that grows faster than
linearly with the number of input records. Specifically, sorting N input records is
known to require somewhere between O(Nlog2N) and O(N2) comparisons, depending
on the specific algorithm and whether one is considering average or worst-case time,
right? Wrong. Your Challenge is to code a sorting algorithm that requires time
proportional to the number of input records.
The bounds in the previous paragraph apply to sorting algorithms that are based
on pairwise comparison of input keys. A distribution sort, however, requires no key
comparisons, and has different performance characteristics. Imagine, for example,
how the post office might sort mail. Initially, the mail might be distributed into piles
by country. Each of those piles might be distributed into other piles by state. Those
piles might be distributed by city, and then by street, and finally by address. All of
this could be accomplished by making a sequence of passes through the input without
ever comparing the sort keys of two input records to one another. This month, you are
to implement a distribution (or “postman’s”) sort algorithm. The prototype for the
code you should write is:
#include
static pascal OSErr PostmansSort(
FILE *inFile, /* stream containing sort input */
FILE *outFile, /* stream to contain sort output */
void *privateStorage, /* preallocated storage for your use */
size_t storageSize /* number of bytes of privateStorage */
);
The input will consist of records in the following format:
[FirstName] [MiddleName] LastName
StreetAddress StreetName
City State [ZipCode] [Country] R
Input records should be read from inFile, sorted, and written to outFile. Both
the input and output files will be opened and closed for you by the calling routine.
Records will consist of up to 8 fields, as indicated above, and be terminated by a
carriage return (0x0D). Fields will be separated by tabs (0x09). The square brackets
in the format description are meta-characters indicating optional fields; the brackets
will not be present in the input. Fields may contain embedded spaces or special
characters (except tabs and returns).
Records are to be sorted into ascending order with Country as the primary sort
key, ZipCode (if present) as the secondary key, then StreetName, StreetAddress,
LastName, FirstName, and finally MiddleName, in that order. If ZipCode is not
present, then State and City should replace it as the secondary and tertiary sort
keys, respectively; otherwise State and City should be ignored. Sort order should be
lexicographic except when a field value is purely numeric, and a purely numeric field
should be treated as smaller than a field containing non-numeric characters. That is,
field values “1”, “2”, “10”, “1B”, “1C”, and “2A” would sort in the order
indicated. Empty optional fields should be considered to have the smallest possible
value (e.g., records with no Country should be output before records with a Country
value). Your routine should return zero (noErr) if it completes normally, or a
non-zero error code if it is unable to complete the sort for any reason.
There are no predetermined limits on the number of distinct countries, State,
City, etc. values that might be present in the input. Your routine will be provided
with storageSize bytes (at least 64KB) of preallocated storage, initialized to zero by
the calling routine, and pointed to by privateStorage. You may not allocate any
additional memory, although use of small amounts of static memory is permitted. If
your solution requires additional storage, you should create and write to temporary
disk files. Adequate disk space is guaranteed. In approximately half of the test cases,
you will be guaranteed at least 32 bytes of memory per record, plus 32 bytes for each
unique field value. Scoring will weight these high-memory cases equally with the
cases where less memory is provided.
This will be a native PowerPC Challenge, scored using the Metrowerks
environment. Solutions may be coded in C, C++, or Pascal. If you use a language other
than C, you should ensure that your code can be called from a test driver written in C.
Most importantly, to be considered correct, your code must sort the input into the
proper order without performing any key comparisons between input records. The
fastest correct solution will be the winner.
This Challenge is based on a suggestion from Peter Shank, passed on to me by
Mike Scanlin. Peter forwarded a reference to an article on the subject by Robert
Ramsey, which you can find at http://www.silcom.com/~ramey/article.html.
Two Months Ago Winner
Congratulations to Ernst Munter (Kanata, ON, Canada) for submitting the fastest
entry to the Mutant Life Challenge. The April Challenge was to write code that would
propagate a population of cellular automata for a specified number of generations
according to rules generalized from John Conway’s Game of Life.
Of the 17 entries I received, 9 worked completely correctly. A number of people
submitted solutions that were partially, but not completely, correct. Some forgot to
stop processing and return when the population became stable. Others had problems
dealing with the BitMap as a torus, which required solutions to wrap from the top row
to the bottom, from the right column to the left, and vice versa.
Algorithm and data structure selection were key to doing well in this Challenge.
Several participants recognized the value of maintaining, for each cell, a count of the
number of occupied neighboring cells, and updating the neighboring counts when the
state of a cell changed. Ernst’s solution takes this idea further, and maintains neighbor
counts for a block of 32 cells together. He uses some interesting modulo-9 math to
take advantage of the fact that neighbor counts are between 0 and 8 to store 32
neighbor counts efficiently in 16 bytes. The winning solution creates a lookup table,