All Databases MacTech Vol 09-1993

Nov 93 Challenge

Volume 9

Number 11

Column Tag Programmers’ Challenge

Programmers’ Challenge

By Mike Scanlin, MacTech Magazine Regular Contributing Author

Note: Source code files accompanying article are located on MacTech CD-ROM or

source code disks.

WHO PLAYS WHO?

Thanks to Kevin Cutts (location unknown) for suggesting this month’s challenge.

The goal is to match up teams for the annual MacTech Bean Counting contest where

there are half as many playing areas as there are teams. Each team needs to play every

other team exactly once. (And they don’t want to wait all day for their schedule to be

generated!)

The input is the number of teams, a list of team names and a list of playing area

names. The number of teams will be an even number less than 25 and the number of

playing areas will be half of the number of teams. The output will be to an existing file

where you describe who plays who on what playing area at what time. Each bean

counting match takes 10 minutes to play, so you can schedule a match every 15

minutes on each playing area. The events don’t start until noon so that everyone

involved has time to sleep in before their big day.

The prototype of the function you write is:

void ScheduleMatches(numTeams,

teamNames, playingAreaNames,

outputFile)

unsigned short numTeams;

Str255 teamNames[];

Str255 playingAreaNames[];

FILE *outputFile;

The outputFile will be open and empty when your routine is called. You write to

the file using the standard C method of fprintf(outputFile, "Here is some output

text.\n");, for example. You should not close the file on exit of your routine (the caller

will close it since the caller opened it).

The format of the output is up to you. It should be intelligible, though. Don’t

skimp on output readability to save a few cycles of time.

The input team names and playing area names are Pascal strings that take 256

bytes each (length byte included). These arrays are read-only; if you want to convert

them to C strings then you’ll have to copy them somewhere first. Don’t worry about

the special formatting requirements of long strings; I will be testing with fairly small

strings.

Here is some sample input:

numTeams = 4;

teamNames[0] = “\pCycleStealers”;

teamNames[1] = “\pBeanies”;

teamNames[2] = “\pRiscTakers”;

teamNames[3] = “\pGiraffeButts”;

playingAreaNames[0] = “\pField 1”;

playingAreaNames[1] = “\pField 2”;

and suggested output format:

12:00

Field 1: CycleStealers vs. Beanies

Field 2: RiscTakers vs. GiraffeButts

12:15

Field 1: CycleStealers vs. RiscTakers

Field 2: Beanies vs. GiraffeButts

12:30

Field 1: CycleStealers vs. GiraffeButts

Field 2: Beanies vs. RiscTakers

TWO MONTHS AGO WINNER

It would appear that the 10 or more people who wrote to me and requested an

assembly language challenge were either (1) kidding, (2) all on vacation during the

last month, or (3) unable to cope with moving bits in assembly language, because I

only received 3 entries to the BlockMoveBits assembly language challenge. And only

one of them gave correct results. Congratulations to Bob Boonstra (Westford, MA) for

(1) entering, (2) having correct code and (3) winning. Bob’s code would have an

excellent chance at winning even with more competition because it is very efficient

indeed. Bob recently won the Where In The World? challenge, too, so this is his second

win (the second two-time winner to date; there are no 3-time winners at this point).

Well done!

Complements to Kevin Cutts for having the guts to enter C code in an assembly

language contest. Despite the fact that his code was 690 bytes and used over 400 bytes

of static lookup table data (compared to Bob’s 166 bytes with no tables) his times

were within a respectable 10% of Bob’s. Correctness, however, is key and Kevin’s

routine gave occasional bogus results so I had to disqualify it (be sure to try all 64

combinations of source and destination bit offsets; each can range from 0 to 7).

MAIL BAG

Recently I received a letter from a MacTech reader which said, in part:

“I DO object to the programming contest though. It rewards convoluted, hard to

maintain code at the expense of speed and size. In the real world the former is MUCH

more important. Programs should be as small and as fast as they can be WITHOUT

sacrificing understanding.”

While I agree with this sentiment to some extent, it is my personal opinion that a

large number of today’s applications suffer from performance problems. And I don’t

think it’s the hardware that is lacking. I think intelligently written apps that do things

like pre-compute data, cache data, use smart data structures and algorithms, and take

advantage of specific processor tricks are doing their users a favor. I know that my

mom, who is not a sophisticated user at all, gets frustrated when simple things like

changing the font or margins of her 20 page letter on her Mac Classic takes longer than

a few seconds (“I thought these computer thingies were suppose to be fast?”). There’s

no reason why simple operations have to take so long. Optimizing data structures,

algorithms and individual C statements is an important part of competing in the

application market.

The purpose of this column is to help people see what kind of tricks and speedups

are possible for those places where you need them. You don’t have to write 100%

totally, absolutely, perfectly efficient code all of the time (although some people do and

my hat is off to them); you only have to do that in about 25% of your application that

is doing all the real work. Also, remember that this column is, after all, a game and

measuring cycles and bytes is much more objective and fair than something open to

interpretation like a “code maintainability” criteria.

Having said that, we can take a look at another type of letter I received recently...

DIVIDE BY 15 TRICK

Frequent Challenge player Gerry Davis writes to me with a non-obvious trick to

do a faster integer divide by 15:

This code:

/* 1 */
long i;
// j must be unsigned to catch overflow
unsigned long j;
j= i/15;
is faster as:
j = ((i+((i+((i+((i+((i+((i+((i+
(i>>4)+1)>>4)+1)>>4)+1)>>4)+1)>>4)+
1)>>4)+1)>>4)+1)>>4);

This is about 5.5 times faster on a 68000 and 1.2 times faster on a 68020. It

adds about 50 bytes of code on the 020, but on the 68000 the code necessary for a long

divide is a lot more than this. You can remove some of the iterations to do short

integers as well.

Thanks, Gerry. I tested it on my Quadra 700 and found that your version is 48

bytes and is about 1.4 times faster on the 040 than the chip’s built-in long divide

instruction.

Does anyone else have any similar special case optimizations they’d like to

share? Send them in!

Here’s Bob’s winning solution:

/*
** BlockMoveBits by Bob Boonstra
**
** Solution strategy:
**   Use 68030 bit field manipulation instructions
**     rather than shifting and masking.
**   Accomplish move in three steps, where the first step
**     aligns destination to longword, second step uses
**     BFEXTU/MOVE.L combination instead of BFEXTU/BFINS to
**     move bulk of the bits, and third step cleans up.
**   Special case when srcBitOffset==destBitOffset,
**     allowing main loop to use MOVE.L (x)+,(y)+
**
** Relative execution times for various strategies:
** 100: Straigntforward BFEXTU/BFINS in 32-bit chunks,
**  70: byte-align src and MOVE.L/BFINS in main loop,
**  58: byte-align dst and BFEXTU/MOVE.L in main loop,
**  50: long-aligned dst and BFEXTU/MOVE.L in main loop,
**  29: as above, if srcOffset==dstOffset use one MOVE.L
*/
/* some register definitions for readability */
#define bitCt     d2
#define srcOffset d6
#define dstOffset d7
#define srcPtr    a0
#define dstPtr    a1
void BlockMoveBits(char *srcBytePtr, char *destBytePtr,
  unsigned char srcBitOffset, unsigned char destBitOffset,
  unsigned short bitCount)
{
  asm 68030 {
; save registers
    MOVEM.L   d6-d7,-(a7)
; exit if no bits to move
    MOVEQ     #0,bitCt
    MOVE.W    bitCount,bitCt
; get params into registers
    MOVE.L    srcBytePtr,srcPtr
    MOVE.L    destBytePtr,dstPtr
    MOVE.B    srcBitOffset,d1
    MOVEQ     #0,d0
    MOVE.B    destBitOffset,d0
; calculate srcOffset and dstOffset in
;   bit field manipulation coordinates
;   (bit 0 is MSB)
    MOVEQ     #7,srcOffset   
    SUB.B     d1,srcOffset
    MOVEQ     #7,dstOffset
    SUB.B     d0,dstOffset
; exit if <= 32 bits to move
    CMPI.L     #32,bitCt
    BLE       @lastbits
; convert dstOffset to initial bit count
    ADDQ.W    #1,d0
; STEP 1:  Move enough bits to longAlign  destination
;          using bit field manipulation
; adjust bit count to longAlign  destination
    MOVE.W    dstPtr,d1
    ANDI.B    #3,d1
    EORI.B    #3,d1
    LSL.B     #3,d1
    ADD.B     d1,d0
; move initial bits
    BFEXTU    (srcPtr){srcOffset:d0},d1
    BFINS     d1,(dstPtr){dstOffset:d0}
; decrement bits left to move
    SUB.L     d0,bitCt
; adjust source offset; this may make
; srcOffset >= 8, but BFEXTU does not care
    ADD.W     d0,srcOffset
; adjust dstPtr to account for alignment
    LSR.B     #3,d0
    ADDQ.B    #1,d0
    ADDA.W    d0,dstPtr
    MOVEQ     #0,dstOffset
; STEP 2:  Main loop, MOVE.L all 32-bit chunks
; set up d0 with number of longwords to move
    MOVE.W    bitCt,d0
    LSR.W     #5,d0
    BLE       @lastbits
; set up bitCt for final BFEXTU/BFINS
    ANDI.W    #31,bitCt
; decrement d0 for subsequent DBRA
    SUBQ.W    #1,d0
; move bits one longword at a time
    MOVE.B    srcOffset,d1
    ANDI.B    #7,d1
    BNE.S     @longloop
; special case, src is byte-aligned
    LSR.B     #3,srcOffset
    ADDA.L    srcOffset,srcPtr
    MOVEQ     #0,srcOffset
alignloop:
    MOVE.L    (srcPtr)+,(dstPtr)+
    DBRA      d0,@alignloop
    BRA.S     @lastbits
; normal case, src not byte-aligned
longloop:
    BFEXTU    (srcPtr){srcOffset:0},d1
    MOVE.L    d1,(dstPtr)+
    ADDQ.L    #4,srcPtr
    DBRA      d0,@longloop
; STEP 3:  Move remaining bits with bit field
;          manipulation
lastbits:
    TST.B     bitCt
    BEQ.S     @done
; move leftover bits
    BFEXTU    (srcPtr){srcOffset:bitCt},d1
    BFINS     d1,(dstPtr){dstOffset:bitCt}
done:
; restore registers
    MOVEM.L   (a7)+,d6-d7
  }
}

The rules

Here’s how it works: Each month there will be a different programming challenge

presented here. First, you must write some code that solves the challenge. Second, you

must optimize your code (a lot). Then, submit your solution to MacTech Magazine

(formerly MacTutor). A winner will be chosen based on code correctness, speed, size

and elegance (in that order of importance) as well as the postmark of the answer. In

the event of multiple equally desirable solutions, one winner will be chosen at random

(with honorable mention, but no prize, given to the runners up). The prize for the

best solution each month is $50 and a limited edition “The Winner! MacTech Magazine

Programming Challenge” T-shirt (not to be found in stores).

In order to make fair comparisons between solutions, all solutions must be in

ANSI compatible C (i.e., don’t use Think’s Object extensions). Only pure C code can be

used. Any entries with any assembly in them will be disqualified (except for those

challenges specifically stated to be in assembly). However, you may call any routine

in the Macintosh toolbox you want (i.e., it doesn’t matter if you use NewPtr instead of

malloc). All entries will be tested with the FPU and 68020 flags turned off in THINK C.

When timing routines, the latest version of THINK C will be used (with ANSI Settings

plus “Honor ‘register’ first” and “Use Global Optimizer” turned on) so beware if you

optimize for a different C compiler. All code should be limited to 60 characters wide.

This will aid us in dealing with e-mail gateways and page layout.

The solution and winners for this month’s Programmers’ Challenge will be

published in the issue two months later. All submissions must be received by the 10th

day of the month printed on the front of this issue.

All solutions should be marked “Attn: Programmers’ Challenge Solution” and

sent to Xplain Corporation (the publishers of MacTech Magazine) via “snail mail” or

preferably, e-mail - AppleLink: MT.PROGCHAL, Internet: progchallenge@xplain.com,

CompuServe: 71552,174 and America Online: MT PRGCHAL. If you send via snail

mail, please include a disk with the solution and all related files (including contact

information). See page 2 for information on “How to Contact Xplain Corporation.”

MacTech Magazine reserves the right to publish any solution entered in the

Programming Challenge of the Month and all entries are the property of MacTech

Magazine upon submission. The submission falls under all the same conventions of an

article submission.

Referenced by (3):