Feb 96 Challenge
Volume Number: 12
Issue Number: 2
Column Tag: Programmer’s Challenge
Programmer’s Challenge
By Bob Boonstra, Westford, Massachusetts
Note: Source code files accompanying article are located on MacTech CD-ROM or
source code disks.
Intersecting Rectangles
The Challenge this month is to write a routine that will accept a list of rectangles and
calculate a result based on the intersections of those rectangles. Specifically, your
code will return a list of non-overlapping rectangles that contain all points enclosed
by an odd (or even) number of the input rectangles. The prototype for the code you
should write is:
void RectangleIntersections(
const Rect inputRects[], /* list if input rectangles */
const long numRectsIn, /* number of inputRects */
Rect outputRects[], /* preallocated storage for output */
long *numRectsOut, /* number of outputRects returned */
const Boolean oddParity /* see text for explanation */
);
The parameter oddParity indicates whether you are to return rectangles
containing points enclosed by an odd number of the numRectsIn inputRects rectangles
(oddParity==true) or by an even (nonzero) number of rectangles
(oddParity==false). Sufficient storage for the output will be preallocated for you
and pointed to by outputRects.
As an example, if you were given these inputRects:
{0,10,20,30}, {5,15,20,30}
and oddParity were true, you might return the following list of outputRects:
{0,10,5,15}, {0,15,5,30}, {5,10,15,20}
It would also be correct to return a result that combined the first of these
rectangles with either of the other two. If oddParity were false, you would return the
following list for the example input:
{5,15,20,30}
The outputRects must be non-empty and non-overlapping. In the example, it
would be incorrect to return the following for the odd parity case:
{0,10,5,30} {0,10,20,15}
The outputRects you generate must also be maximal, in the sense that each edge
of each of the outputRects should pass through a vertex of one of the inputRects.
That is, for example, I don’t want you to return a 1¥1 rectangle representing each
point enclosed in the desired number of inputRects. Before returning, set
*numRectsOut to indicate the number of outputRects you generated.
If you need auxiliary storage, you may allocate any reasonable amount within
your code using toolbox routines or malloc, but you must deallocate that storage before
returning. (No memory leaks! - I’ll be calling your code many times.)
This native PowerPC Challenge will be scored using the latest Metrowerks
compiler, with the winner determined by execution time. If you have any questions, or
would like some test data for your code, please send me e-mail at one of the
Programmer’s Challenge addresses, or directly to bob_boonstra@mactech.com. Test
data will also be sent to the Programmer’s Challenge mailing list, which you can join
by sending a message to autoshare@mactech.com with the SUBJECT line “sub challenge
YourName”, substituting your real name for YourName.
Two Months Ago Winner
Eight of the 13 solutions submitted for the Find Again And Again Challenge worked
correctly. Congratulations to Gustav Larsson (Mountain View, CA) for submitting
an entry that was significantly faster than the others. The problem was to write a text
search engine optimized to operate repeatedly on the same block of text. A variety of
optimization techniques were represented in the solutions, a couple of which are
highlighted in the table of results below. Several people optimized for the case where
the same word was repeatedly searched for. Some of my tests included this case, and
those results are in the columns headed “repeat.” The “random” columns shows
results for tests that searched for random occurrences of random words. Each of the
tests were run under conditions where only 64KB of auxiliary storage was available,
and where much more memory was available. These conditions were weighted 20% and
80% respectively in calculating the total time, since the problem statement promised
that ample memory would usually be provided. You can see that Gustav’s solution
performed reasonably well when memory was scarce, and very well when memory was
plentiful.
Gustav’s solution hashes as many words of the input text as possible in the
initialization routine. He uses the Boyer-Moore-Horspool algorithm to find words in
any text that was not parsed during initialization. Other features of the approach are
described in the well-commented code.
Here are the times and code sizes for entries that passed by tests. Numbers in
parentheses after a person’s name indicate that person’s cumulative point total for all
previous Challenges, not including this one.
64K Memory >>64K Memory code
Name repeat random repeat random time size
Gustav Larsson (67) 1814 3773 62 111 1255 3584
Tom Saxton 46 16400 197 459 3814 2000
Xan Gregg (81) 27 2907 1316 2835 3907 1664
Kevin Cutts (46) 1760 3234 1760 2809 4654 1600
Joseph Ku 8856 14570 121 509 5189 1584
David Cary 60 22665 499 1000 5745 2124
Eric Lengyel (40) 34 10221 29 4697 5831 1188
Ernst Munter (110) 2036 2053 2287 4603 6330 2976
Top Contestants of All time
Here are the Top Contestants for the Programmer’s Challenges to date, including
everyone who has accumulated more than 20 points. The numbers below include points
awarded for this month’s entrants.
Rank Name Points Rank Name Points
1. [Name deleted] 176 11. Mallett, Jeff 44
2. Munter, Ernst 110 12. Kasparian, Raffi 42
3. Gregg, Xan 88 13. Vineyard, Jeremy 42
4. Larsson, Gustav 87 14. Lengyel, Eric 40
5. Karsh, Bill 80 15. Darrah, Dave 31
6. Stenger, Allen 65 16. Landry, Larry 29
7. Riha, Stepan 51 17. Elwertowski, Tom 24
8. Cutts, Kevin 50 18. Lee, Johnny 22
9. Goebel, James 49 19. Noll, Robert 22
10. Nepsund, Ronald 47
There are three ways to earn points: (1) scoring in the top 5 of any Challenge,
(2) being the first person to find a bug in a published winning solution or, (3) being
the first person to suggest a Challenge that I use. The points you can win are:
1st place 20 points 5th place [TOKEN:12832]points
2nd place 10 points finding bug 2 points
3rd place 7 points suggesting Challenge 2 points
4th place 4 points
Here is Gustav’s winning solution:
Find Again and Again
Copyright © 1995 Gustav Larsson
Constants & Types
#define ALPHABET_SIZE 256
#define ALLOC_SIZE(n) ((n+3) & -4L) /* next multiple of 4 */
#define HASH_BUCKETS 1024 /* must be power of 2 */
#define HASH_MASK (HASH_BUCKETS - 1)
#define NO_NULL_CHAR 'A'
#define NULL 0
typedef unsigned char uchar;
typedef unsigned short ushort;
typedef unsigned long ulong;
typedef struct Word Word;
typedef struct Occurrence Occurrence;
typedef struct Private Private;
/*
A block of occurrence positions. We pack in as many occurrences as possible into
a single block, from 3 to 6 depending on textLength.
The first entry in the block is always used. The remaining entries are in use if they
are not zero. These facts are used several places to simplify the code.
*/
struct Occurrence {
Occurrence *next;
union {
ushort pos2[6]; /* 2 bytes/occurrence */
struct {
ushort lo[4];
uchar hi[4];
} pos3; /* 3 bytes/occurrence */
long pos4[3]; /* 4 bytes/occurrence */
} p;
};
/*
There is one Word struct for each distinct word. The word’s length is stored in the
top eight bits of the hash value. There’s no need to store the characters in the word
since we can just look at the first occurrence (first entry in Word.first).
*/
struct Word {
Word *next;
ulong hash;
Occurrence *last;
Occurrence first;
};
/*
The structure of our private storage. The hashCodes[] array serves two purposes: it
distinguishes alphanumeric from non-alphanumeric characters, and it provides a
non-zero hash code for each alphanumeric character. The endParsedText field will
be -1 if there was enough private memory to parse all the text. Otherwise, it points to
the start of the unparsed text. nullChar is used by the BMH_Search() function when
we must search unparsed text for an occurrence.
*/
struct Private {
ulong hashCodes [ ALPHABET_SIZE ];
Word *hashTable [ HASH_BUCKETS ];
long endParsedText; /* start of parsed text */
long posBytes; /* POS_x_BYTES, below */
char nullChar; /* char not appearing in the text */
long heap; /* start of private heap */
};
/*
These macros simplify access to the occurrence positions stored in an Occurrence
struct. Posbytes is a macro argument that is usually set to private->posBytes.
However, you can also use a constant for posbytes, which lets the compiler choose