Dec 98 Prog Challenge
Volume Number: 14
Issue Number: 12
Column Tag: Programmer's Challenge
Dec 98 Programmer's Challenge
by Bob Boonstra, Westford, MA
Word Neighbors
Several of you have written me to point out that the Challenges have been getting more
and more difficult over the years. Thinking back over the past few months, where we
have solved an orbital mechanics problem, programmed a winning Hearts solution, and
written an emulator for the first stored-program computer, I can see where this
might be true. This month the problem is easier. Your Challenge is to write a search
routine that will find a set of words that are near to one another in some target text.
You might imagine the solution being used as part of an internet search engine,
providing a capability more selective than looking for all of the target words being
present on a page, but less restrictive than requiring the target words to form a
phrase. The prototype for the code you should write is:
#if defined(__cplusplus)
pascal void Initialize(
char *text, /* NULL terminated text to be searched */
long distance, /* max distance between nearby words */
void *privateStorage, /* private storage for your use */
long storageBytes /* number of bytes in privateStorage */
);
pascal long FindNearby( /* return number of matches found */
char *words[], /* words to find in text */
long numWords, /* number of words */
Boolean caseSensitive, /* true if match is case sensitive */
Boolean preserveOrder, /* true if words must be found in order */
long matchPositions[], /* position in text of first word in match
*/
long maxMatches /* max number of matches to return */
);
#if defined(__cplusplus)
}
#endif
Your Initialize routine is called to give you an opportunity to preprocess the text to be
searched. The text consists of a sequence of words separated by delimiters. For the
purpose of this Challenge, a word is a sequence of alphanumeric characters separated
by one or more non-alphanumeric characters. The text may include ASCII characters
between 0x20 and 0x7E, inclusive, plus carriage returns (0x0D), linefeeds (0x0A),
and tabs (0x09). All non-alphanumeric characters are delimiters.
You are given the maximum distance to be allowed between adjacent search words. A
distance of 0 corresponds to adjacent words, a distance of 1 corresponds to search
words with one intervening word, etc. You are also given a privateStorage pointer to
storage available for your use, along with the size (storageBytes) of privateStorage in
bytes. There will be at least 16 bytes of storage for each unique (case-sensitive) word
in the text, plus 4 bytes for each word occurrence.
For each time that Initialize is called with new text to be searched, your FindNearby
routine will be called an average of 10 to 50 times with a set of numWords words to
find. You will be told whether the search is to be caseSensitive or not, and whether the
words must be found in order (preserveOrder==TRUE) in the text. You must find the
first maxMatches occurrences of the words in the text, where a match occurs when
each of the words is separated by no more than a maximum number (distance) of
intervening words from another of the search words. For each match, you should
return in matchPositions the offset in text of the first of the search words found in
text. For example, if distance==2 and the text is "The quick brown fox jumps over the
lazy dog.", you would return 4 if the words were "brown", "quick", and "jumps",
provided preserveOrder is FALSE. No word in the text may be part of more than one
match.
The winner will be the solution that correctly finds the word matchPositions in the
minimum execution time, including the time taken by the Initialize routine..
This will be a native PowerPC Challenge, using the latest CodeWarrior environment.
Solutions may be coded in C, C++, or Pascal.
Three Months Ago Winner
Congratulations once again to Ernst Munter (Kanata, Ontario) for submitting the
fastest solution to the September "Big Baby" Challenge. Eleven people submitted
entries that simulated the operation of the Manchester Mark 1 prototype computer,
the world's first stored program computer, which was fifty years old this past June.
Ernst was one of four entries to take advantage of our annual September opportunity to
include assembly language in their solution. One of the novel things about the winning
solution is the way Ernst executes instructions in pairs, trying to take full advantage
of the PowerPC processor. The solution accurately models an unusual overflow
characteristic of the original Baby computer, described in the Baby documentation and
mentioned in the solution comments. However, because this overflow behavior occurs
infrequently, the winning solution detects an overflow in its fast mode of execution and
switches to a slower mode of operation to process it.
The problem statement required contestants to treat memory the way it appeared on
the Baby CRT display, with the least significant bit on the left. Some of the solutions
did this via enormous table lookups, leading to a large memory footprint. Ernst did the
bit-flipping in his CopyMem routine using the TWIDDLE macro. He uses one of the
PowerPC's most versatile instructions, the Rotate Left Word Immediate Then And With
Mask, or rlwinm instruction, to first flip bytes, then 4-bit nibbles, then 2-bit
chunks, and finally individual bits.
I used nine test cases to evaluate the Big Baby solutions, including four that executed
the Assembler function. While the original Baby computer used only 5 address bits,
allowing it to address 32 words of memory, the Big Baby Challenge required
Challengers to deal with larger memory spaces, and my test cases used up to 8 address
bits. The test cases included two programs demonstrating self-modifying code: a
16-bit counter program submitted by Ernst Munter, and a BabyFill program
submitted by Peter Lewis that counts the number of words of memory available. (Both
Ernst and Peter earn two points for submitting these programs.) The test cases also
included two programs submitted to the University of Manchester Baby contest last
June, a prime number generation program, and a "3 minute noodle timer" program
that is quite interesting to watch execute on the graphical Baby simulator available at
the University of Manchester web site. The 16-bit counter, BabyFill, and prime
number programs are reproduced below.
16-bit counter program
0000 JMP 18
0001 LDN 20
0002 JRP 19
0003 NUM 536870911
0004 NUM 134217727
0005 NUM 33554431
0006 NUM 8388607
0007 NUM 2097151
0008 NUM 524287
0009 NUM 131071
0010 NUM 32767
0011 NUM 8191
0012 NUM 2047