May 95 Challenge
Volume Number: 11
Issue Number: 5
Column Tag: Programmer’s Challenge
Programmer’s Challenge 
By Mike Scanlin, Mountain View, CA
Note: Source code files accompanying article are located on MacTech CD-ROM orsource code disks.
Hy-phen-a-tion
Hyphenation algorithms come in two flavors: rule-based and dictionary-based. Of the
two, dictionary-based is more reliable but has the downside of requiring a lot of
storage space. Rule-based methods require considerably less space and have the option
of using an “exceptions” dictionary to improve accuracy. This month’s Challenge is to
implement a rule-based hyphenation algorithm.
The algorithm is this: Each part of the word on either side of the hyphen must
include a vowel, not counting a final e, es or ed. The part of the word after the hyphen
cannot begin with a vowel or a double consonant. No break is made between any two of
the following letter combinations: sh, gh, ph, ch, th, wh, gr, pr, cr, tr, wr, br, fr, dr,
vowel-r, vowel-n, or om. That means that if any two of those pairs occur next to each
other, you can’t break them apart (i.e. ‘from’ contains ‘fr’ and ‘om’, which are both
on the list; therefore you would not split it between the ‘r’ and the ‘o’). The letter ‘y’
is not a vowel.
The two routines you’ll write are:
void *
InitHyphenation(maxRAM)
ulong maxRAM;
void
Hyphenate(privateDataPtr, inPtr, outPtr)
void *privateDataPtr;
Str255 *inPtr;
Str255 *outPtr;
The InitHyphenation routine is an untimed routine called once that can set up whatever tables you might want to use. It must not allocate more than maxRAM bytes,
which will be between 16K and 64K.
Hyphenate is the routine that does the work. PrivateDataPtr is the return
value from the InitHyphenation routine. InPtr is a pointer to a read-only
unhyphenated Pascal word (containing only letters ‘a’..‘z’ and ‘A’..‘Z’). OutPtr is a
pointer to 256 bytes of space where you store the hyphenated word. You need to
preserve the case of the input and you should insert a kHyphen byte (0x2D)
everywhere the above algorithm tells you to (yes, the complete output is guaranteed to
fit within a Str255).
Note that this algorithm is valid for English only. It will find about 70% of all
valid hyphens and will make mistakes about 45% of the time. It is assumed that if this
was part of a real application that a hyphenation exception word list would be kept.
That list would be checked before calling this function. You, however, don’t need to be
concerned with that.
Write to me if you have any questions.
Two Months Ago Winner
Congrats to Gustav Larsson (Mountain View, CA) for winning the Method Dispatcher
Challenge. Gustav was a virtual unknown to this column a few months ago but he is
rising fast in the Top 20 rankings. The 20 points he wins this month move him from
10th place to 4th place.
Here are the times and code sizes for each entry. Numbers in parens after a
person’s name indicate that person’s cumulative point total for all previous
Programmer Challenges, not including this one:
Name time code
Gustav Larsson (40) 261 1040
Kevin Cutts (36) 304 266
Jeff Mallett (27) 338 1192
Xan Gregg 349 158
Ernst Munter (51) 385 1466
Thomas Studer 437 572
David Howarth 632 130
Scanlin’s brute force method 420 122
Everyone who entered implemented some kind of cache. Gustav chose a 10-way
set-associative cache. He splits the 16K of usable cache space into 256 cache sets, each
of which holds 10 entries. He uses a hash based on the class and method numbers to
map to one of the 256 caches. Once he’s got that he checks the 10 entries for a match. If
he doesn’t find a match he uses an efficient binary search to look for the method within
the class.
In designing any cache, the choice of hash function and amount of
set-associativity (if any) is highly dependent on your data and access pattern. Gustav’s
code could be tuned for a variety of efficient cache uses. Nice job, Gustav.
Poker Winner Disqualified
Turns out that I failed to do adequate testing on the winning entry for the Poker Hand
Evaluator Challenge, in which Kevin Cutts was the published winner. Unfortunately,
I’m going to have to retro-actively disqualify Kevin and award the 1st place prize to
Gustav Larsson. My apologies to Kevin and the other MacTech readers. Rather than
publish Gustav’s well-commented winning solution (it’s quite long) I’ll make it
available via e-mail. If you’re interested, send me a note at scanlin@genmagic.com or
at any of the Programmer Challenge electronic addresses (see p. 2).
This latent win for Gustav means that he’s 3 for 3 in the last 3 challenges, which
is an unmatched winning streak. Congrats, Gustav! Also, Dave Darrah was the first
person to point out the flaws in Kevin’s code and so he receives 5 extra points in the
cumulative point totals for doing so. Thanks, Dave!
Top 20 Contestants of All Time
Here are the Top 20 Contestants for the 33 Programmer’s Challenges to date. The
numbers below include points awarded for this months’ top 5 entrants. (Note: ties are
listed alphabetically by last name -- there are 23 people listed this month because 7
people have 20 points each.)
1. Boonstra, Bob 176
2. Karsh, Bill 71
3. Stenger, Allen 65