All Databases MacTech Vol 12-1996

Feb 96 Challenge

Volume Number: 12

Issue Number: 2

Column Tag: Programmer’s Challenge

Programmer’s Challenge

By Bob Boonstra, Westford, Massachusetts

Note: Source code files accompanying article are located on MacTech CD-ROM or

source code disks.

Intersecting Rectangles

The Challenge this month is to write a routine that will accept a list of rectangles and

calculate a result based on the intersections of those rectangles. Specifically, your

code will return a list of non-overlapping rectangles that contain all points enclosed

by an odd (or even) number of the input rectangles. The prototype for the code you

should write is:

void RectangleIntersections(

const Rect inputRects[], /* list if input rectangles */

const long numRectsIn, /* number of inputRects */

Rect outputRects[], /* preallocated storage for output */

long *numRectsOut, /* number of outputRects returned */

const Boolean oddParity /* see text for explanation */

);

The parameter oddParity indicates whether you are to return rectangles

containing points enclosed by an odd number of the numRectsIn inputRects rectangles

(oddParity==true) or by an even (nonzero) number of rectangles

(oddParity==false). Sufficient storage for the output will be preallocated for you

and pointed to by outputRects.

As an example, if you were given these inputRects:

{0,10,20,30}, {5,15,20,30}

and oddParity were true, you might return the following list of outputRects:

{0,10,5,15}, {0,15,5,30}, {5,10,15,20}

It would also be correct to return a result that combined the first of these

rectangles with either of the other two. If oddParity were false, you would return the

following list for the example input:

{5,15,20,30}

The outputRects must be non-empty and non-overlapping. In the example, it

would be incorrect to return the following for the odd parity case:

{0,10,5,30} {0,10,20,15}

The outputRects you generate must also be maximal, in the sense that each edge

of each of the outputRects should pass through a vertex of one of the inputRects.

That is, for example, I don’t want you to return a 1¥1 rectangle representing each

point enclosed in the desired number of inputRects. Before returning, set

*numRectsOut to indicate the number of outputRects you generated.

If you need auxiliary storage, you may allocate any reasonable amount within

your code using toolbox routines or malloc, but you must deallocate that storage before

returning. (No memory leaks! - I’ll be calling your code many times.)

This native PowerPC Challenge will be scored using the latest Metrowerks

compiler, with the winner determined by execution time. If you have any questions, or

would like some test data for your code, please send me e-mail at one of the

Programmer’s Challenge addresses, or directly to bob_boonstra@mactech.com. Test

data will also be sent to the Programmer’s Challenge mailing list, which you can join

by sending a message to autoshare@mactech.com with the SUBJECT line “sub challenge

YourName”, substituting your real name for YourName.

Two Months Ago Winner

Eight of the 13 solutions submitted for the Find Again And Again Challenge worked

correctly. Congratulations to Gustav Larsson (Mountain View, CA) for submitting

an entry that was significantly faster than the others. The problem was to write a text

search engine optimized to operate repeatedly on the same block of text. A variety of

optimization techniques were represented in the solutions, a couple of which are

highlighted in the table of results below. Several people optimized for the case where

the same word was repeatedly searched for. Some of my tests included this case, and

those results are in the columns headed “repeat.” The “random” columns shows

results for tests that searched for random occurrences of random words. Each of the

tests were run under conditions where only 64KB of auxiliary storage was available,

and where much more memory was available. These conditions were weighted 20% and

80% respectively in calculating the total time, since the problem statement promised

that ample memory would usually be provided. You can see that Gustav’s solution

performed reasonably well when memory was scarce, and very well when memory was

plentiful.

Gustav’s solution hashes as many words of the input text as possible in the

initialization routine. He uses the Boyer-Moore-Horspool algorithm to find words in

any text that was not parsed during initialization. Other features of the approach are

described in the well-commented code.

Here are the times and code sizes for entries that passed by tests. Numbers in

parentheses after a person’s name indicate that person’s cumulative point total for all

previous Challenges, not including this one.

64K Memory >>64K Memory code

Name repeat random repeat random time size

Gustav Larsson (67) 1814 3773 62 111 1255 3584

Tom Saxton 46 16400 197 459 3814 2000

Xan Gregg (81) 27 2907 1316 2835 3907 1664

Kevin Cutts (46) 1760 3234 1760 2809 4654 1600

Joseph Ku 8856 14570 121 509 5189 1584

David Cary 60 22665 499 1000 5745 2124

Eric Lengyel (40) 34 10221 29 4697 5831 1188

Ernst Munter (110) 2036 2053 2287 4603 6330 2976

Top Contestants of All time

Here are the Top Contestants for the Programmer’s Challenges to date, including

everyone who has accumulated more than 20 points. The numbers below include points

awarded for this month’s entrants.

Rank Name Points Rank Name Points

1. [Name deleted] 176 11. Mallett, Jeff 44

2. Munter, Ernst 110 12. Kasparian, Raffi 42

3. Gregg, Xan 88 13. Vineyard, Jeremy 42

4. Larsson, Gustav 87 14. Lengyel, Eric 40

5. Karsh, Bill 80 15. Darrah, Dave 31

6. Stenger, Allen 65 16. Landry, Larry 29

7. Riha, Stepan 51 17. Elwertowski, Tom 24

8. Cutts, Kevin 50 18. Lee, Johnny 22

9. Goebel, James 49 19. Noll, Robert 22

10. Nepsund, Ronald 47

There are three ways to earn points: (1) scoring in the top 5 of any Challenge,

(2) being the first person to find a bug in a published winning solution or, (3) being

the first person to suggest a Challenge that I use. The points you can win are:

1st place 20 points 5th place [TOKEN:12832]points

2nd place 10 points finding bug 2 points

3rd place 7 points suggesting Challenge 2 points

4th place 4 points

Here is Gustav’s winning solution:

Find Again and Again

Constants & Types

#define ALPHABET_SIZE 256

#define ALLOC_SIZE(n) ((n+3) & -4L) /* next multiple of 4 */

#define HASH_BUCKETS 1024 /* must be power of 2 */

#define HASH_MASK (HASH_BUCKETS - 1)

#define NO_NULL_CHAR 'A'

#define NULL 0

typedef unsigned char  uchar;
typedef unsigned short ushort;
typedef unsigned long  ulong;
typedef struct Word Word;
typedef struct Occurrence Occurrence;
typedef struct Private Private;
/*

A block of occurrence positions. We pack in as many occurrences as possible into

a single block, from 3 to 6 depending on textLength.

The first entry in the block is always used. The remaining entries are in use if they

are not zero. These facts are used several places to simplify the code.

struct Occurrence {

Occurrence *next;

union {

ushort pos2[6]; /* 2 bytes/occurrence */

    struct {
      ushort lo[4];
      uchar  hi[4];
    } pos3;           /* 3 bytes/occurrence */

long pos4[3]; /* 4 bytes/occurrence */

  } p;
};
/*

There is one Word struct for each distinct word. The word’s length is stored in the

top eight bits of the hash value. There’s no need to store the characters in the word

since we can just look at the first occurrence (first entry in Word.first).

struct Word {

Word *next;

ulong hash;

Occurrence *last;

Occurrence first;

};
/*

The structure of our private storage. The hashCodes[] array serves two purposes: it

distinguishes alphanumeric from non-alphanumeric characters, and it provides a

non-zero hash code for each alphanumeric character. The endParsedText field will

be -1 if there was enough private memory to parse all the text. Otherwise, it points to

the start of the unparsed text. nullChar is used by the BMH_Search() function when

we must search unparsed text for an occurrence.

struct Private {

ulong hashCodes [ ALPHABET_SIZE ];

Word *hashTable [ HASH_BUCKETS ];

long endParsedText; /* start of parsed text */

long posBytes; /* POS_x_BYTES, below */

char nullChar; /* char not appearing in the text */

long heap; /* start of private heap */

};

Macros

These macros simplify access to the occurrence positions stored in an Occurrence

struct. Posbytes is a macro argument that is usually set to private->posBytes.

However, you can also use a constant for posbytes, which lets the compiler choose

Referenced by (4):