Dec 95 Challenge
Volume Number: 11
Issue Number: 12
Column Tag: Programmer’s Challenge
Programmer’s Challenge 
By Bob Boonstra, Westford, Massachusetts
Note: Source code files accompanying article are located on MacTech CD-ROM orsource code disks.
Find Again And Again
This month the Challenge is to write a text search engine that is optimized to operate
repeatedly on the same text. You will be given a block of text, some storage for data
structures, and an opportunity to analyze the text before being asked to perform any
searches against that text. Then you will repeatedly be asked to find a specific
occurrence of a given word in that block of text. The prototypes for the code you should
write are:
void InitFind(
char *textToSearch, /* find words in this block of text */
long textLength, /* number of chars in textToSearch */
void *privateStorage, /* storage for your use */
long storageSize /* number of bytes in privateStorage */
);
long FindWordOccurrence(
/* return offset of wordToFind in textToSearch */
char *wordToFind, /* find this word in textToSearch */
long wordLength, /* number of chars in wordToFind */
long occurrenceToFind, /* find this instance of wordToFind */
char *textToSearch, /* same parameter passed to InitFind */
long textLength, /* same parameter passed to InitFind */
void *privateStorage, /* same parameter passed to InitFind */
long storageSize /* same parameter passed to InitFind */
);
The InitFind routine will be called once for a given block of textLength characters at textToSearch to allow you to analyze the text, create data structures,
and store them in privateStorage. When InitFind is called, storageSize bytes of
memory at privateStorage will have been preallocated and initialized to zero.
FindWordOccurrence is to search for words, where a word is defined as a
continuous sequence of alphanumeric characters delimited by a non-alphanumeric
character (e.g., space, tab, punctuation, hyphen, CR, NL, or other special character).
Your code should look for complete words - it would be incorrect, for example, to
return a value pointing to the word “these” if the wordToFind was “the”. The
wordToFind will be a legal word (i.e., no embedded delimiters).
FindWordOccurrence should return the offset in textToSearch of the
occurrenceToFind-th instance of wordToFind. It should return -1 if wordToFind
does not occur in textToSearch, or if there are fewer than occurrenceToFind
instances of wordToFind.
Both the InitFind and the FindWordOccurrence routines will be timed in
determining the winner. In designing your code, you should assume that
FindWordOccurrence will be called approximately 1000 times for each call to
InitFind (with the same textToSearch, but possibly differing values of wordToFind
and occurrenceToFind).
There is no predefined limit on textLength - you should handle text of arbitrary
length. The amount of privateStorage available could be very large, but is
guaranteed to be at least 64K bytes. While the test cases will include at least one large
textToSearch with a small storageSize, most test cases will provide at least 32
bytes for each occurrence of a word in textToSearch, so you might want to optimize
for that condition.
Other fine print: you may not change the input pointed to by textToSearch or
wordToFind, and you should not use any static storage other than that provided in
privateStorage.
This will be a native PowerPC Challenge, scored using the latest CodeWarrior
compiler. Good luck, and happy searching.
Programmer’s Challenge Mailing List
We are pleased to announce the creation of the Programmer’s Challenge Mailing List.
The list will be used to distribute the latest Challenge, provide answers to questions
about the current Challenge, and discuss suggestions for future Challenges. The
Challenge problem will be posted to the list each month, sometime between the 20th
and the 25th of the month. This should alleviate problems caused by variations in the
publication and mailing date of the magazine, and provide a predictable amount of time
to work on each Challenge.
To subscribe to the list, send a message to autoshare@mactech.com with the
SUBJECT line “sub challenge YourName”, substituting your real name for YourName.
To unsubscribe from the list, send a message to autoshare@mactech.com with the
SUBJECT line “unsub challenge”.
Note: the list server, autoshare, is set to accept commands in the SUBJECT line,
not the body of the message. If you have any problems, please contact
online@mactech.com.
Two Month’s Ago Winner
The Master Mindreader Challenge inspired ten readers to enter, and all ten solutions
gave correct results. Congratulations to Xan Gregg (Durham, N.C.) for producing the
fastest entry and winning the Challenge.
The problem required you to write code that would correctly guess a sequence of
colors using a callback routine provided in the problem statement that returned two
values for each guess: the number of elements of the guess where the correct color is
located in the correct place in the sequence, and the number of elements where the
correct color is in an incorrect place in the sequence. The number of guesses was not
an explicit factor in determining the winner, but the time used by the callback routine
was included in determining the winner. Participants correctly noted that this made
the relative execution time of the guessing routine and the callback routine a factor in
designing a fast solution. A couple of entries went so far as to offer their own, more
efficient, callbacks. Nice try, but I didn’t use them - the callback in the problem was
designed to provide a known time penalty for making a guess, and that was the callback
I used in evaluating solutions.
The callback I supplied had one unanticipated side effect - it permitted callers to
supply an out-of-range value for positions in the sequence that they didn’t care about
for that guess, and six of the entries took advantage of this loophole. This wasn’t what I
had intended, and I gave some thought to giving priority to solutions that did not use the
loophole. In the end, however, I decided not to treat these entries any differently,
because the solution statement permitted and provided a defined callback behavior for
out-of-range guesses. As it turned out, the winning entry and three of the fastest four
entries did not use out-of-range guesses.
Xan’s winning code first makes a sequence of guesses to determine how many
positions are set to each of the possible colors. He then starts with an initial guess
corresponding to these colors and begins swapping positions to determine how the
number of correctly placed colors is affected. Separate logic handles the cases where
the number of correctly placed colors increased or decreased by 0, 1, or 2, all the
while keeping track of which color possibilities have been eliminated for each position.
These and other details of Xan’s algorithm are documented in the comments to his code.
The table of results below indicates, in addition to execution time, the cumulative
number of guesses used by each entry for all test cases. In general, it shows the
expected rough correlation between execution time and the number of guesses, with a
significant exception for the second-place entry from Ernst Munter, which took
significantly fewer guesses. Ernst precalculated tables to define the guessing strategy
for problems of length 5 or less and devised a technique for partitioning larger
problems to use these tables. Normally I try to discourage the use of extensive
precalculated data, but I decided to allow this entry because the amount of data was not
unreasonable, because the tables guided the algorithm but did not precalculate a
solution, and because I thought the approach was innovative and interesting. Although
including the second-place entry in the article is not possible because of length
restrictions, I have included the preamble from Ernst’s solution describing his
approach.
Here are the times and code sizes for each of the entries. Numbers in parentheses
after a person’s name indicate that person’s cumulative point total for all previous
Challenges, not including this one.
Name time guesses code data out-of-range
values used?
Xan Gregg (61) 102 4123 1360 16 no
Ernst Munter (90) 109 2880 6264 5480 limited
Gustav Larsson (60) 116 3700 712 40 no
Greg Linden 127 5002 576 16 no
M. Panchenko (4) 146 5391 344 16 yes
Eric Lengyel (20) 176 6456 312 16 yes
Peter Hance 206 6557 336 16 yes
J. Vineyard (42) 228 9933 328 16 no
Ken Slezak (10) 251 6544 808 16 yes
Stefan Sinclair 259 11058 200 16 yes
Top 20 Contestants of All Time
Here are the Top 20 Contestants for the Programmer’s Challenges to date. The
numbers below include points awarded for this month’s entrants. (Note: ties are listed
alphabetically by last name - there are more than 20 people listed this month because
of ties.)
Rank Name ·Points
1. [Name deleted] 176
2. Munter, Ernst 100
3. Gregg, Xan 81
4. Karsh, Bill 78
5. Larsson, Gustav 67
6. Stenger, Allen 65
7. Riha, Stepan 51
8. Goebel, James 49
9. Nepsund, Ronald 47
10. Cutts, Kevin 46
11. Mallett, Jeff 44
12. Kasparian, Raffi 42
13. Vineyard, Jeremy 42
14. Darrah, Dave 31
15. Landry, Larry 29
16. Elwertowski, Tom 24
17. Lee, Johnny 22
18. Noll, Robert 22
19. Anderson, Troy 20
20. Beith, Gary 20
21. Burgoyne, Nick 20
22. Galway, Will 20
23. Israelson, Steve 20
24. Landweber, Greg 20
25. Lengyel, Eric 20
26. Pinkerton, Tom 20
There are three ways to earn points: (1) scoring in the top 5 of any Challenge,
(2) being the first person to find a bug in a published winning solution or, (3) being
the first person to suggest a Challenge that I use. The points you can win are:
1st place 20 points
2nd place 10 points
3rd place 7 points
4th place 4 points
5th place 2 points
finding bug 2 points
suggesting Challenge 2 points
Here is Xan’s winning solution:
MindReader
By Xan Gregg,Durham, N.C.
/*
I try to minimize the number of guesses without adding too much complexity to the
code. First I figure out how many of each color are present in the answer by
essentially repeatedly guessing all of each color.
Then I figure out the correct positions one at a time starting at slot 0. I exchange it
with each other slot (one at a time) until the correct color is found. When there is a
change in the numCorrect response from checkGuess I can tell which of the two
slots caused the change by looking at my remembered information or, if necessary,
by performing a second guess with one of the colors in both slots.
The “remembered information” includes keeping track of colors that were
determined (via the numCorrectchange) to be wrong before and/or a swap is made.
This doesn’t help out too often, but it doesn’t take much time to record compared to
calling checkGuess.
While the outer loop determines the color of each slot “left-to-right” (0 to n-1), I
found that indexing the inner loop right-to-left instead of left-to-right increased the
speed by 30% - 40%. I wish I understood why!
Oddly, the checkGuess function spends most of its time figuring out the numWrong
value, which we generally ignore.
*/
typedef void (*CheckGuessProcPtr)(
unsigned char *theGuess,
unsigned short *numInCorrectPos,
unsigned short *numInWrongPos);
#define kMaxLength 16
#define Bit(color) (1L << (long) (color))
MindReader
void MindReader(unsigned char guess[],
CheckGuessProcPtr checkGuess,
unsigned short answerLength,
unsigned short numColors)
long prevColorsFound;
long colorsFound;
long curColor;
long i, j;
long curCorrect;
long numOfColor[kMaxLength + 1]; /* 1-based */
Boolean isCorrect[kMaxLength];
long possibilities[kMaxLength]; /* bit fields */
long colorBit1;
long colorBit2;
char color1;
char color2;
long delta;
unsigned short newCorrect;
unsigned short newWrong;
/* first find the correct set of colors */
colorsFound = 0;
curColor = 1;
while (colorsFound < answerLength)