Nov 93 Challenge
Volume 9
Number 11
Column Tag Programmers’ Challenge
Programmers’ Challenge 
By Mike Scanlin, MacTech Magazine Regular Contributing Author
Note: Source code files accompanying article are located on MacTech CD-ROM orsource code disks.
WHO PLAYS WHO?
Thanks to Kevin Cutts (location unknown) for suggesting this month’s challenge.
The goal is to match up teams for the annual MacTech Bean Counting contest where
there are half as many playing areas as there are teams. Each team needs to play every
other team exactly once. (And they don’t want to wait all day for their schedule to be
generated!)
The input is the number of teams, a list of team names and a list of playing area
names. The number of teams will be an even number less than 25 and the number of
playing areas will be half of the number of teams. The output will be to an existing file
where you describe who plays who on what playing area at what time. Each bean
counting match takes 10 minutes to play, so you can schedule a match every 15
minutes on each playing area. The events don’t start until noon so that everyone
involved has time to sleep in before their big day.
The prototype of the function you write is:
void ScheduleMatches(numTeams,
teamNames, playingAreaNames,
outputFile)
unsigned short numTeams;
Str255 teamNames[];
Str255 playingAreaNames[];
FILE *outputFile;
The outputFile will be open and empty when your routine is called. You write to
the file using the standard C method of fprintf(outputFile, "Here is some output
text.\n");, for example. You should not close the file on exit of your routine (the caller
will close it since the caller opened it).
The format of the output is up to you. It should be intelligible, though. Don’t
skimp on output readability to save a few cycles of time.
The input team names and playing area names are Pascal strings that take 256
bytes each (length byte included). These arrays are read-only; if you want to convert
them to C strings then you’ll have to copy them somewhere first. Don’t worry about
the special formatting requirements of long strings; I will be testing with fairly small
strings.
Here is some sample input:
numTeams = 4;
teamNames[0] = “\pCycleStealers”;
teamNames[1] = “\pBeanies”;
teamNames[2] = “\pRiscTakers”;
teamNames[3] = “\pGiraffeButts”;
playingAreaNames[0] = “\pField 1”;
playingAreaNames[1] = “\pField 2”;
and suggested output format:
12:00
Field 1: CycleStealers vs. Beanies
Field 2: RiscTakers vs. GiraffeButts
12:15
Field 1: CycleStealers vs. RiscTakers
Field 2: Beanies vs. GiraffeButts
12:30
Field 1: CycleStealers vs. GiraffeButts
Field 2: Beanies vs. RiscTakers
TWO MONTHS AGO WINNER
It would appear that the 10 or more people who wrote to me and requested an
assembly language challenge were either (1) kidding, (2) all on vacation during the
last month, or (3) unable to cope with moving bits in assembly language, because I
only received 3 entries to the BlockMoveBits assembly language challenge. And only
one of them gave correct results. Congratulations to Bob Boonstra (Westford, MA) for
(1) entering, (2) having correct code and (3) winning. Bob’s code would have an
excellent chance at winning even with more competition because it is very efficient
indeed. Bob recently won the Where In The World? challenge, too, so this is his second
win (the second two-time winner to date; there are no 3-time winners at this point).
Well done!
Complements to Kevin Cutts for having the guts to enter C code in an assembly
language contest. Despite the fact that his code was 690 bytes and used over 400 bytes
of static lookup table data (compared to Bob’s 166 bytes with no tables) his times
were within a respectable 10% of Bob’s. Correctness, however, is key and Kevin’s
routine gave occasional bogus results so I had to disqualify it (be sure to try all 64
combinations of source and destination bit offsets; each can range from 0 to 7).
MAIL BAG
Recently I received a letter from a MacTech reader which said, in part:
“I DO object to the programming contest though. It rewards convoluted, hard to
maintain code at the expense of speed and size. In the real world the former is MUCH
more important. Programs should be as small and as fast as they can be WITHOUT
sacrificing understanding.”
While I agree with this sentiment to some extent, it is my personal opinion that a
large number of today’s applications suffer from performance problems. And I don’t
think it’s the hardware that is lacking. I think intelligently written apps that do things
like pre-compute data, cache data, use smart data structures and algorithms, and take
advantage of specific processor tricks are doing their users a favor. I know that my
mom, who is not a sophisticated user at all, gets frustrated when simple things like
changing the font or margins of her 20 page letter on her Mac Classic takes longer than
a few seconds (“I thought these computer thingies were suppose to be fast?”). There’s
no reason why simple operations have to take so long. Optimizing data structures,
algorithms and individual C statements is an important part of competing in the
application market.
The purpose of this column is to help people see what kind of tricks and speedups
are possible for those places where you need them. You don’t have to write 100%
totally, absolutely, perfectly efficient code all of the time (although some people do and
my hat is off to them); you only have to do that in about 25% of your application that
is doing all the real work. Also, remember that this column is, after all, a game and
measuring cycles and bytes is much more objective and fair than something open to
interpretation like a “code maintainability” criteria.
Having said that, we can take a look at another type of letter I received recently...
DIVIDE BY 15 TRICK
Frequent Challenge player Gerry Davis writes to me with a non-obvious trick to
do a faster integer divide by 15:
This code:
/* 1 */
long i;
// j must be unsigned to catch overflow
unsigned long j;
j= i/15;
is faster as:
j = ((i+((i+((i+((i+((i+((i+((i+
(i>>4)+1)>>4)+1)>>4)+1)>>4)+1)>>4)+
1)>>4)+1)>>4)+1)>>4);
This is about 5.5 times faster on a 68000 and 1.2 times faster on a 68020. It
adds about 50 bytes of code on the 020, but on the 68000 the code necessary for a long
divide is a lot more than this. You can remove some of the iterations to do short
integers as well.
Thanks, Gerry. I tested it on my Quadra 700 and found that your version is 48
bytes and is about 1.4 times faster on the 040 than the chip’s built-in long divide
instruction.
Does anyone else have any similar special case optimizations they’d like to
share? Send them in!
Here’s Bob’s winning solution:
/*
** BlockMoveBits by Bob Boonstra
**
** Solution strategy:
** Use 68030 bit field manipulation instructions
** rather than shifting and masking.
** Accomplish move in three steps, where the first step
** aligns destination to longword, second step uses
** BFEXTU/MOVE.L combination instead of BFEXTU/BFINS to
** move bulk of the bits, and third step cleans up.
** Special case when srcBitOffset==destBitOffset,
** allowing main loop to use MOVE.L (x)+,(y)+
**
** Relative execution times for various strategies:
** 100: Straigntforward BFEXTU/BFINS in 32-bit chunks,
** 70: byte-align src and MOVE.L/BFINS in main loop,
** 58: byte-align dst and BFEXTU/MOVE.L in main loop,
** 50: long-aligned dst and BFEXTU/MOVE.L in main loop,
** 29: as above, if srcOffset==dstOffset use one MOVE.L
*/
/* some register definitions for readability */
#define bitCt d2
#define srcOffset d6
#define dstOffset d7
#define srcPtr a0
#define dstPtr a1
void BlockMoveBits(char *srcBytePtr, char *destBytePtr,
unsigned char srcBitOffset, unsigned char destBitOffset,
unsigned short bitCount)
{
asm 68030 {
; save registers
MOVEM.L d6-d7,-(a7)
; exit if no bits to move
MOVEQ #0,bitCt
MOVE.W bitCount,bitCt
; get params into registers
MOVE.L srcBytePtr,srcPtr
MOVE.L destBytePtr,dstPtr
MOVE.B srcBitOffset,d1
MOVEQ #0,d0
MOVE.B destBitOffset,d0
; calculate srcOffset and dstOffset in
; bit field manipulation coordinates
; (bit 0 is MSB)
MOVEQ #7,srcOffset
SUB.B d1,srcOffset
MOVEQ #7,dstOffset
SUB.B d0,dstOffset
; exit if <= 32 bits to move
CMPI.L #32,bitCt
BLE @lastbits
; convert dstOffset to initial bit count
ADDQ.W #1,d0
; STEP 1: Move enough bits to longAlign destination
; using bit field manipulation
; adjust bit count to longAlign destination
MOVE.W dstPtr,d1
ANDI.B #3,d1
EORI.B #3,d1
LSL.B #3,d1
ADD.B d1,d0
; move initial bits
BFEXTU (srcPtr){srcOffset:d0},d1
BFINS d1,(dstPtr){dstOffset:d0}
; decrement bits left to move
SUB.L d0,bitCt
; adjust source offset; this may make
; srcOffset >= 8, but BFEXTU does not care
ADD.W d0,srcOffset
; adjust dstPtr to account for alignment
LSR.B #3,d0
ADDQ.B #1,d0
ADDA.W d0,dstPtr
MOVEQ #0,dstOffset
; STEP 2: Main loop, MOVE.L all 32-bit chunks
; set up d0 with number of longwords to move
MOVE.W bitCt,d0
LSR.W #5,d0
BLE @lastbits
; set up bitCt for final BFEXTU/BFINS
ANDI.W #31,bitCt
; decrement d0 for subsequent DBRA
SUBQ.W #1,d0
; move bits one longword at a time
MOVE.B srcOffset,d1
ANDI.B #7,d1
BNE.S @longloop
; special case, src is byte-aligned
LSR.B #3,srcOffset
ADDA.L srcOffset,srcPtr
MOVEQ #0,srcOffset
alignloop:
MOVE.L (srcPtr)+,(dstPtr)+
DBRA d0,@alignloop
BRA.S @lastbits
; normal case, src not byte-aligned
longloop:
BFEXTU (srcPtr){srcOffset:0},d1
MOVE.L d1,(dstPtr)+
ADDQ.L #4,srcPtr
DBRA d0,@longloop
; STEP 3: Move remaining bits with bit field
; manipulation
lastbits:
TST.B bitCt
BEQ.S @done
; move leftover bits
BFEXTU (srcPtr){srcOffset:bitCt},d1
BFINS d1,(dstPtr){dstOffset:bitCt}
done:
; restore registers
MOVEM.L (a7)+,d6-d7
}
}
The rules
Here’s how it works: Each month there will be a different programming challenge
presented here. First, you must write some code that solves the challenge. Second, you
must optimize your code (a lot). Then, submit your solution to MacTech Magazine
(formerly MacTutor). A winner will be chosen based on code correctness, speed, size
and elegance (in that order of importance) as well as the postmark of the answer. In
the event of multiple equally desirable solutions, one winner will be chosen at random
(with honorable mention, but no prize, given to the runners up). The prize for the
best solution each month is $50 and a limited edition “The Winner! MacTech Magazine
Programming Challenge” T-shirt (not to be found in stores).
In order to make fair comparisons between solutions, all solutions must be in
ANSI compatible C (i.e., don’t use Think’s Object extensions). Only pure C code can be
used. Any entries with any assembly in them will be disqualified (except for those
challenges specifically stated to be in assembly). However, you may call any routine
in the Macintosh toolbox you want (i.e., it doesn’t matter if you use NewPtr instead of
malloc). All entries will be tested with the FPU and 68020 flags turned off in THINK C.
When timing routines, the latest version of THINK C will be used (with ANSI Settings
plus “Honor ‘register’ first” and “Use Global Optimizer” turned on) so beware if you
optimize for a different C compiler. All code should be limited to 60 characters wide.
This will aid us in dealing with e-mail gateways and page layout.
The solution and winners for this month’s Programmers’ Challenge will be
published in the issue two months later. All submissions must be received by the 10th
day of the month printed on the front of this issue.
All solutions should be marked “Attn: Programmers’ Challenge Solution” and
sent to Xplain Corporation (the publishers of MacTech Magazine) via “snail mail” or
preferably, e-mail - AppleLink: MT.PROGCHAL, Internet: progchallenge@xplain.com,
CompuServe: 71552,174 and America Online: MT PRGCHAL. If you send via snail
mail, please include a disk with the solution and all related files (including contact
information). See page 2 for information on “How to Contact Xplain Corporation.”
MacTech Magazine reserves the right to publish any solution entered in the
Programming Challenge of the Month and all entries are the property of MacTech
Magazine upon submission. The submission falls under all the same conventions of an
article submission.