All Databases MacTech Vol 10-1994

Jun 94 Challenge

Volume Number: 10

Issue Number: 6

Column Tag: Programmers’ Challenge

!seealso: "May 94 Challenge" " Jul 94 Challenge

Programmers’ Challenge

By Mike Scanlin, MacTech Magazine Regular Contributing Author

Note: Source code files accompanying article are located on MacTech CD-ROM or

source code disks.

The rules

Here’s how it works: Each month there will be a different programming challenge

presented here. First, you must write some code that solves the challenge. Second, you

must optimize your code (a lot). Then, submit your solution to MacTech Magazine

(formerly MacTutor). A winner will be chosen based on code correctness, speed, size

and elegance (in that order of importance) as well as the postmark of the answer. In

the event of multiple equally desirable solutions, one winner will be chosen at random

(with honorable mention, but no prize, given to the runners up). The prize for the

best solution each month is $50 and a limited edition “The Winner! MacTech Magazine

Programming Challenge” T-shirt (not to be found in stores).

In order to make fair comparisons between solutions, all solutions must be in

ANSI compatible C (i.e., don’t use Think’s Object extensions). Only pure C code can be

used. Any entries with any assembly in them will be disqualified (except for those

challenges specifically stated to be in assembly). However, you may call any routine

in the Macintosh toolbox you want (i.e., it doesn’t matter if you use NewPtr instead of

malloc). All entries will be tested with the FPU and 68020 flags turned off in THINK C.

When timing routines, the latest version of THINK C will be used (with ANSI Settings

plus “Honor ‘register’ first” and “Use Global Optimizer” turned on) so beware if you

optimize for a different C compiler. All code should be limited to 60 characters wide.

This will aid us in dealing with e-mail gateways and page layout.

The solution and winners for this month’s Programmers’ Challenge will be

published in the issue two months later. All submissions must be received by the 10th

day of the month printed on the front of this issue.

All solutions should be marked “Attn: Programmers’ Challenge Solution” and

sent to Xplain Corporation (the publishers of MacTech Magazine) via “snail mail” or

preferably, e-mail - AppleLink: MT.PROGCHAL, Internet: progchallenge@xplain.com,

CompuServe: 71552,174 and America Online: MT PRGCHAL. If you send via snail

mail, please include a disk with the solution and all related files (including contact

information). See page 2 for information on “How to Contact Xplain Corporation.”

MacTech Magazine reserves the right to publish any solution entered in the

Programming Challenge of the Month. Authors grant MacTech Magazine the

non-exclusive right to publish entries without limitation upon submission of each

entry. Copyrights for the code are retained by the author.

FACTORING

Being able to factor quickly is an important part of breaking secret codes, I

mean, writing cool Mac games. This month’s challenge, therefore, is to factor a 64-bit

number into the two primes that were multiplied together to produce it.

The prototype of the function you write is:

/* 1 */
void Factor64(lowHalf, highHalf
prime1Ptr, prime2Ptr)
unsigned long lowHalf;
unsigned long highHalf;
unsigned long *prime1Ptr;
unsigned long *prime2Ptr;

highHalf and lowHalf are the 64-bit input number split into two pieces (bit zero

of lowHalf is bit 0 of the input number and bit 31 of highHalf is bit 63 of the input

number). The input number is guaranteed to be the product of two primes, each of

which is 32 bits or less. Your routine will store one prime at *prime1Ptr and the

other one at *prime2Ptr (in either order).

Remember, solutions must be in C to qualify for entry into the Challenge but

assembly versions might get mentioned if they’re wicked fast. Also, if anyone has a

nice routine for factoring even larger numbers (like, say, 256-bit numbers) into

composite primes and wouldn’t mind sharing it with MacTech readers then send it on

in. The best one might get published along with the winning solution.

TWO MONTHS AGO WINNER

The competition for the Swap Blocks challenge was unusually tough. There were

several very high quality entries. Congratulations to Bill Karsh (Chicago, IL) for

winning with the fastest entry. It was only last month that I declared Bob Boonstra

(Westford, MA) the Programmer Challenge Champion for having the most number of

first place showings but now he and Bill are tied for that elusive title (with three wins

each). Jorg Brown (San Francisco, CA) deserves praise for his second place showing.

His code size was just over half of Bill’s winning solution and was nearly as fast.

Here are the code sizes and times for two different tests. The first time test was

for random size inputs (according to the distribution stated in the problem). The

second time test was for blocks that were roughly, but not exactly, equal in size

(again, with the given distributions but with both sizes coming from the same size

category). Numbers in parens after a person’s name indicate how many times that

person has finished in the top 5 places of all previous Programmer Challenges, not

including this one:

Name time 1 time 2 code size

Bill Karsh (3) 170 219 642

Jorg Brown 174 242 366

Jim Lloyd 209 408 1642

Lorn Olsen 239 350 670

Ted Krovetz 243 247 88

Stepan Riha (6) 243 347 452

Bob Boonstra (8) 247 443 480

Jeffry Spain 248 397 234

Greg Landweber (1) 264 491 300

Martin Weiss 281 601 210

Christopher Suley 299 321 110

Dave Darrah 299 681 284

Ernst Munter 315 414 632

Xan Gregg 340 1260 484

Michael Anderson 359 942 156

Allen Stenger (5) 393 436 156

Michael Panchenko 409 465 82

Danny Stevenson 449 583 424

Eric Bennett 493 1478 284

Arnold Woodworth 595 729 206

Bob Boonstra 212 418 400

(assembly)

The SwapBytes problem is really a multi-byte rotate problem. Think about it

this way: If you had a 32-bit register and you wanted to swap the low 7 bits with the

upper 25 bits you could just rotate it 7 bit positions to the right. The rotate

instruction is like a SwapBits operation where size1 + size2 always equals 32.

Almost everyone who entered used a variant of this observation. The fifth place

entry by Ted Krovetz (Santa Cruz, CA) illustrates it nicely:

/* 2 */
void SwapBlocks (void *p1, void *p2,
void *swapPtr, ulong size1,
ulong size2, ulong swapSize)
{
long *lp1 = (long *)p1;
long *lp2 = (long *)p2;
ulong s1 = size1 >> 2;
ulong s2 = size2 >> 2;
ulong count;
long temp, *tempp1, *tempp2;
do {
if (s1 < s2) {
count = s1;
tempp1 = lp1;
s2 -= s1;
tempp2 = lp2 + s2;
}
else {
count = s2;
tempp1 = lp1;
tempp2 = lp2;
lp1 += s2;
s1 -= s2;
}
do {
temp = *tempp1;
*(tempp1++) = *tempp2;
*(tempp2++) = temp;
} while (--count);
} while (s1);
}

Because Bill’s winning solution is so general purpose and macro-ized it is not the

easiest code to read (although I commend his generality in making a useful piece of

reusable and portable code). He has compile-time flags that let you build a large fast

version (over 600 bytes, which was the version timed) or a small slower version

(less than 100 bytes). And you can optionally change the 4 byte alignment assumption

into a 2 byte or 1 byte alignment assumption (by redefining AtomSize).

I used Think C’s preprocessor command to see what all those #defines would boil

down to. The core swap code for those cases where you can’t use the temporary swap

space (cause it’s too small) ends up looking like this:

/* 3 */
switch( (short)q ) {
case 0:
while( --nS ) {
q = *pL;
*pL++ = *pR;
*pR++ = q;
case 7:
q = *pL;
*pL++ = *pR;
*pR++ = q;
case 6:
q = *pL;
*pL++ = *pR;
*pR++ = q;
case 5:
q = *pL;
*pL++ = *pR;
*pR++ = q;
case 4:
q = *pL;
*pL++ = *pR;
*pR++ = q;
case 3:
q = *pL;
*pL++ = *pR;
*pR++ = q;
case 2:
q = *pL;
*pL++ = *pR;
*pR++ = q;
case 1:
q = *pL;
*pL++ = *pR;
*pR++ = q;
} /* end while */
}; /* end switch */

This illustrates some interesting loop unrolling syntax that’s possible in C. As

the code shows, it’s legal to spread a while statement over several case labels in a

switch statement. Which nicely solves the problem of “How do you handle the

remainder?” when you unroll a loop 8 times. In this example nS is the number of

times to swap divided by 8 and q is numTimesToSwap mod 8. So if numTimesToSwap is

10 then q is 2 and nS is 1. When the switch statement is executed it will branch to case

2 which does 2 swaps and then loops back to the top of the while loop. It runs through

one set of 8 swaps and then stops. Pretty cool syntax.

Here’s Bill’s winning solution:

SwapBlocks

Response to Apr 94 MacTech Programmer's Challenge.

by Bill Karsh

Object: Exchange contents of two adjacent memory blocks.

Redirection: This is an interesting problem, but what would make this guy really

useful? As stated, the blocks for the challenge are 4i bytes long and start on 4j aligned

addresses. These are special circumstances which apply to Memory Manager blocks,

and then, only on 68020 or later cpu's. Memory blocks on the 68000 are merely

even aligned and even length. Further, this could be a word processor tool for swapping

runs of bytes, but we would have to relax the alignment and size restrictions even

further to arbitrary address and length since we would almost always be pointing to

characters interior to a handle.

I have written the routine to give its best performance, subject to a specified

minimum enforced alignment and atom size (smallest unit to move). This is controlled

at compile time by:

/* 4 */
typedef long  Atom, for len = 4i, addr = 4j,
typedef short Atom, for len = 2i, addr = 2j,
typedef Byte  Atom, for len = any, addr = any.

Note - due to an ancient law of portability, preprocessor directives are not

allowed to compare enums, types, sizeof()s or anything else that has machine

dependency hidden in it. This means you have to #define the AtomSize manually. This is

needed to select the proper performance crossover points for that type.

But wait there’s more... You might not tolerate a 644 byte dedicated word

swapper in your text editor, but a 96 byte one might fit. We handle that.

You can tailor the routine to your requirements for execution speed vs. code size

by setting the JobMode constant according to this table:

JobMode Buffers MonsterCopies MonsterSwaps

Smallest No ‰No No

Small No ‰No Yes

Fast Yes ‰No Yes

Fastest Yes ‚Yes Yes

- billKarsh

/* 5 */

Referenced by (3):