Aug 94 Challenge
Volume Number: 10
Issue Number: 8
Column Tag: Programmer’s Challenge
Programmer’s Challenge
By Mike Scanlin, Mountain View, CA
Note: Source code files accompanying article are located on MacTech CD-ROM or
source code disks.
Dumpbytes
When writing programmer utilities like disassemblers, disk editors and memory
viewers it’s useful to have around a very fast “dump” routine that takes a bunch of
bytes and displays them in hex and ascii. The MPW tool DumpFile encompasses most of
the desired functionality. This month’s challenge is to write a fast version of some of
the DumpFile functionality.
The prototype of the function you write is:
/* 1 */
DumpBytes(inputBytes, outputText,
numInputBytes, maxOutputBytes,
width, grouping)
Ptr inputBytes;
Ptr outputText;
unsigned short numInputBytes;
unsigned short maxOutputBytes;
unsigned short width;
unsigned short grouping;
inputBytes and outputText are the pointers to the input bytes (which you’re
trying to display) and the output text (which is all printable ascii, ready to display).
numInputBytes is the number of input bytes you have to work with (more than zero)
and maxOutputBytes is the size of the buffer that outputText points to. The return
value of the function is the actual number of output bytes created by DumpBytes and
will always be less than or equal to maxOutputBytes (or zero if there’s output buffer
overflow). Like the DumpFile tool, the width parameter is the number of input bytes
to display on each output line (it will be from 1 to 64 with 16 being given more
weight than the other values) and grouping is the number of output bytes to group
together without intervening spaces (also from 1 to 64 with 1, 2 and 4 being given
more wight than the other values). The width parameter will always be a multiple of
the grouping parameter.
Here are a few examples (the comments describe the parameters but are not part
of the actual output):
/* 2 */
/* width = 8, grouping = 1 */
0: 23 09 53 74 61 72 74 75 #.Startu
8: 70 20 2D 20 4D 50 57 20 p.-.MPW.
10: 53 68 65 6C 6C 20 53 74 Shell.St
/* width = 8, grouping = 8 */
0: 2309537461727475 #.Startu
8: 70202D204D505720 p.-.MPW.
10: 5368656C6C205374 Shell.St
/* width = 9, grouping = 3 */
0: 230953 746172 747570 #.Startup
9: 202D20 4D5057 205368 .-.MPW.Sh
12: 656C6C 205374 617274 ell.Start
Non-printable characters should be represented by a dot ‘.’ in the ascii section of
the output. You can print a space character as a space or a dot (your choice). When in
doubt on how to handle a certain situation, check the MPW DumpFile tool and do what it
does (or something very similar). As always, I’m available for questions in case
something is not clear (see the e-mail addresses section).
You should be careful about parameters that will cause you to output more bytes
than the maxOutputBytes will allow. If you run out of output buffer space then just fill
it up as much as you can and return 0. I won’t be testing the output overflow cases
much because the goal of this exercise it to have a very fast hex and ascii displayer. If
someone were to actually use the code it is assumed that they would know the context
and provide an output buffer that was always large enough (and assert that the return
value was not zero).
Two Months Ago Winner
Congratulations to Bob Boonstra (Westford, MA) for reclaiming the title of the
Programmer Challenge Champion this month. This month’s win brings his 1st place
totals to four, which is more than anyone else. Like Bob, second place winner Allen
Stenger (Gardena, CA) also based his solution on Fermat’s algorithm but ended up with
an implementation that was not quite as fast as Bob’s. Third place winner Ernst Munter
(Kanata, ON, Canada) chose a different route and first implemented his solution in 386
assembly (!) and then wrote some graphics routines to illustrate the behavior of his
code in order to help him optimize further. But in the end he says he didn’t have enough
time to do as much as he would have liked to his C version.
Here are the code sizes and times. The time1 numbers are the times to factor
some 64 bit numbers while the time2 numbers are the times to factor some 32 bit
numbers (where highHalf is zero), which was not given much weight when ranking
(but it’s interesting to see how some people optimized for this case). Numbers in
parens after a person’s name indicate how many times that person has finished in the
top 5 places of all previous Programmer Challenges, not including this one:
Name time1 time2 code
Bob Boonstra (9) 5 7 820
Allen Stenger (6) 11 24 896
Ernst Munter 15 2 1190
John Raley 25 186 520
Liber, Anspach, Phillips 436 14 620
Clement Goebel (3) 1094 1 1026
Jim Lloyd (1) 3920 20 4279
Alex Novickis 18800 53 9542
Bob’s code is well commented so I won’t go over it here. Also, for a discussion of
Fermat’s factoring algorithm you can check out The Art of Computer Programming,
v.2, by Donald Knuth.
One thing that made this problem slightly harder than normal was that you had to
work with 64 bit integers. Allen Stenger ended up creating his own set of double-long
macros which I’ll give here because they might come in handy some day if you ever
have to work with 64 bit integers:
/* 3 */
#define OVERFLOW(x) (0 != (0x80000000 & (x)))
#define DOCARRY(x) { x ## high++; x ## low &= 0x7FFFFFFF;}
#define DOBORROW(x) { x ## high--; x ## low &= 0x7FFFFFFF;}
#define GT_ZERO(x) ((x ## high >= 0) && (x ## low != 0))
#define EQ_ZERO(x) ((x ## high == 0) && (x ## low == 0))
#define LT_ZERO(x) ((x ## high < 0))
#define INCR(x,a) {if (OVERFLOW(x ## low += a)) DOCARRY(x);}
#define DECR(x,a) {if (OVERFLOW(x ## low -= a)) DOBORROW(x);}
#define PLUS_EQUALS(x, y) {
x ## high += y ## high;
if (OVERFLOW(x ## low += y ## low))
DOCARRY(x);}
#define MINUS_EQUALS(x, y) {
x ## high -= y ## high;
if (OVERFLOW(x ## low -= y ## low))
DOBORROW(x);}
Here’s Bob’s winning solution:
Solution strategy
Factoring is a field which has been the subject of a great deal of research because
of the implications for cryptography, e specially techniques that depend on the
difficulty of factoring very large numbers. Therefore, it is possible that some of these
algorithms could be applied to the challenge.
However, in the event that no mathematician specializing in the field chooses to
enter the Challenge, this relatively simple solution takes advantage of some of the
simplifying conditions in the problem statement:
1) the numbers are relatively small (64 bits, or ~<20 digits)
2) the prime factors are even smaller (32 bits, or ~<10 digits)
This solution depends on no precomputed information. It is based on Fermat's
algorithm, described in Knuth Vol II, which is e specially well suited to the problem
because it is most efficient when the two p, [sorry, the rest of the sentence was
missing - Ed stb]
Fermat's algorithm requires ~(p-1)sqrt(n) iterations, where n=u*v and
u~=p*sqrt(n), v~=sqrt(n)/p. Other algorithms require half as many iterations, but
require more calculation per iteration.
Fermat's algorithm works as follows:
1) Let n - u*v, u and v odd primes.
2) Set a = (u+v)/2 and b = (u-v)/2.
3) Then n = uv = a**2 - b**2
4) Initialize a = trunc(sqrt(n)), b=0, r=a**2-b**2-n
5) Iterate looking for r==0, with an inner loop that keeps a=(u+v)/2 constant and
increases b=(u-v)/2 by 1 each iteration until r becomes negative. When this
happens, the halfsum a is increased by 1, and the difference loop is repeated.
The algorithm in Knuth uses auxiliary variables x,y for efficiency, where x =
2*a+1 and y = 2*b+1
This works fine in most cases, but causes overflow of a longword when x,y are
the full 32-bits in size. So we have augmented the algorithm to deal with this case.
This solution also uses an efficient integer sqrt algorithm due to Ron Hunsinger,
and extends that algorithm to 64 bits.
/* 4 */
#pragma options(assign_registers,honor_register)
#define ulong unsigned long
#define ushort unsigned short
#define kLo16Bits 0xFFFF
#define kHiBit 0x80000000UL
#define kLo2Bits 3
#define kLo1Bit 1
/*
Macros RightShift2 and RightShift1 shift a 64-bit value right by 2
and 1 bits, respectively.
*/
#define RightShift32By2(xL,xH)
{
xL >>= 2;
xL |= (xH & kLo2Bits)<<30;
xH >>= 2;
}
#define RightShift32By1(xL,xH)
{
xL >>= 1;
xL |= (xH & kLo1Bit)<<31;
xH >>= 1;
}
/*
Macros Add32To64 (Sub32From64) add (subtract) a 32-bit value to
(from) a 64-bit value.
*/
#define Add32To64(rL,rH, a)
temp = rL;
if ((rL += a) < temp) ++rH;
#define Add2NPlus1To64(lowR,highR,a)
Add32To64(lowR,highR,a);
Add32To64(lowR,highR,a);
Add32To64(lowR,highR,1);
#define Sub32From64(rL,rH, s)
temp = rL;
if ((rL -= s) > temp) --rH;
#define Sub2NPlus1From64(lowR,highR,s)
Sub32From64(lowR,highR,s);
Sub32From64(lowR,highR,s);
Sub32From64(lowR,highR,1);
//Macros Add64 (Sub64) add (subtract) two 64-bit values.
#define Add64(qL,qH, eL,eH)
Add32To64(qL,qH,eL);
qH += eH;
#define Sub64(qL,qH, eL,eH)
Sub32From64(qL,qH, eL);
qH -= eH;
/*
Macro Square64 multiplies a 32-bit value by itself to produce the
square as a 64-bit value. For this solution, we only need to execute
this macro expansion once.
*/
#define Square64(a,rL,rH,temp)
{
ulong lohi,lolo,hihi;
ushort aHi,aLo;

aHi = a>>16;
aLo = a;