All Databases MacTech Vol 10-1994

Aug 94 Challenge

Volume Number: 10

Issue Number: 8

Column Tag: Programmer’s Challenge

Programmer’s Challenge

By Mike Scanlin, Mountain View, CA

Note: Source code files accompanying article are located on MacTech CD-ROM or

source code disks.

Dumpbytes

When writing programmer utilities like disassemblers, disk editors and memory

viewers it’s useful to have around a very fast “dump” routine that takes a bunch of

bytes and displays them in hex and ascii. The MPW tool DumpFile encompasses most of

the desired functionality. This month’s challenge is to write a fast version of some of

the DumpFile functionality.

The prototype of the function you write is:

/* 1 */

unsigned short

DumpBytes(inputBytes, outputText,

numInputBytes, maxOutputBytes,

width, grouping)

Ptr inputBytes;

Ptr outputText;

unsigned short numInputBytes;

unsigned short maxOutputBytes;

unsigned short width;

unsigned short grouping;

inputBytes and outputText are the pointers to the input bytes (which you’re

trying to display) and the output text (which is all printable ascii, ready to display).

numInputBytes is the number of input bytes you have to work with (more than zero)

and maxOutputBytes is the size of the buffer that outputText points to. The return

value of the function is the actual number of output bytes created by DumpBytes and

will always be less than or equal to maxOutputBytes (or zero if there’s output buffer

overflow). Like the DumpFile tool, the width parameter is the number of input bytes

to display on each output line (it will be from 1 to 64 with 16 being given more

weight than the other values) and grouping is the number of output bytes to group

together without intervening spaces (also from 1 to 64 with 1, 2 and 4 being given

more wight than the other values). The width parameter will always be a multiple of

the grouping parameter.

Here are a few examples (the comments describe the parameters but are not part

of the actual output):

/* 2 */
/* width = 8, grouping = 1 */
0: 23 09 53 74 61 72 74 75 #.Startu
8: 70 20 2D 20 4D 50 57 20 p.-.MPW.
10: 53 68 65 6C 6C 20 53 74 Shell.St
/* width = 8, grouping = 8 */
0: 2309537461727475 #.Startu
8: 70202D204D505720 p.-.MPW.
10: 5368656C6C205374 Shell.St
/* width = 9, grouping = 3 */
0: 230953 746172 747570 #.Startup
9: 202D20 4D5057 205368 .-.MPW.Sh
12: 656C6C 205374 617274 ell.Start

Non-printable characters should be represented by a dot ‘.’ in the ascii section of

the output. You can print a space character as a space or a dot (your choice). When in

doubt on how to handle a certain situation, check the MPW DumpFile tool and do what it

does (or something very similar). As always, I’m available for questions in case

something is not clear (see the e-mail addresses section).

You should be careful about parameters that will cause you to output more bytes

than the maxOutputBytes will allow. If you run out of output buffer space then just fill

it up as much as you can and return 0. I won’t be testing the output overflow cases

much because the goal of this exercise it to have a very fast hex and ascii displayer. If

someone were to actually use the code it is assumed that they would know the context

and provide an output buffer that was always large enough (and assert that the return

value was not zero).

Two Months Ago Winner

Congratulations to Bob Boonstra (Westford, MA) for reclaiming the title of the

Programmer Challenge Champion this month. This month’s win brings his 1st place

totals to four, which is more than anyone else. Like Bob, second place winner Allen

Stenger (Gardena, CA) also based his solution on Fermat’s algorithm but ended up with

an implementation that was not quite as fast as Bob’s. Third place winner Ernst Munter

(Kanata, ON, Canada) chose a different route and first implemented his solution in 386

assembly (!) and then wrote some graphics routines to illustrate the behavior of his

code in order to help him optimize further. But in the end he says he didn’t have enough

time to do as much as he would have liked to his C version.

Here are the code sizes and times. The time1 numbers are the times to factor

some 64 bit numbers while the time2 numbers are the times to factor some 32 bit

numbers (where highHalf is zero), which was not given much weight when ranking

(but it’s interesting to see how some people optimized for this case). Numbers in

parens after a person’s name indicate how many times that person has finished in the

top 5 places of all previous Programmer Challenges, not including this one:

Name time1 time2 code

Bob Boonstra (9) 5 7 820

Allen Stenger (6) 11 24 896

Ernst Munter 15 2 1190

John Raley 25 186 520

Liber, Anspach, Phillips 436 14 620

Clement Goebel (3) 1094 1 1026

Jim Lloyd (1) 3920 20 4279

Alex Novickis 18800 53 9542

Bob’s code is well commented so I won’t go over it here. Also, for a discussion of

Fermat’s factoring algorithm you can check out The Art of Computer Programming,

v.2, by Donald Knuth.

One thing that made this problem slightly harder than normal was that you had to

work with 64 bit integers. Allen Stenger ended up creating his own set of double-long

macros which I’ll give here because they might come in handy some day if you ever

have to work with 64 bit integers:

/* 3 */
#define OVERFLOW(x) (0 != (0x80000000 & (x)))
#define DOCARRY(x) { x ## high++; x ## low &= 0x7FFFFFFF;}
#define DOBORROW(x) { x ## high--; x ## low &= 0x7FFFFFFF;}
#define GT_ZERO(x) ((x ## high >= 0) && (x ## low != 0))
#define EQ_ZERO(x) ((x ## high == 0) && (x ## low == 0))
#define LT_ZERO(x) ((x ## high < 0))
#define INCR(x,a) {if (OVERFLOW(x ## low += a)) DOCARRY(x);}
#define DECR(x,a) {if (OVERFLOW(x ## low -= a)) DOBORROW(x);}
#define PLUS_EQUALS(x, y) { 
x ## high += y ## high; 
if (OVERFLOW(x ## low += y ## low))
DOCARRY(x);}
#define MINUS_EQUALS(x, y) { 
x ## high -= y ## high; 
if (OVERFLOW(x ## low -= y ## low))
DOBORROW(x);}

Here’s Bob’s winning solution:

Solution strategy

Factoring is a field which has been the subject of a great deal of research because

of the implications for cryptography, e specially techniques that depend on the

difficulty of factoring very large numbers. Therefore, it is possible that some of these

algorithms could be applied to the challenge.

However, in the event that no mathematician specializing in the field chooses to

enter the Challenge, this relatively simple solution takes advantage of some of the

simplifying conditions in the problem statement:

1) the numbers are relatively small (64 bits, or ~<20 digits)

2) the prime factors are even smaller (32 bits, or ~<10 digits)

This solution depends on no precomputed information. It is based on Fermat's

algorithm, described in Knuth Vol II, which is e specially well suited to the problem

because it is most efficient when the two p, [sorry, the rest of the sentence was

missing - Ed stb]

Fermat's algorithm requires ~(p-1)sqrt(n) iterations, where n=u*v and

u~=p*sqrt(n), v~=sqrt(n)/p. Other algorithms require half as many iterations, but

require more calculation per iteration.

Fermat's algorithm works as follows:

1) Let n - u*v, u and v odd primes.

2) Set a = (u+v)/2 and b = (u-v)/2.

3) Then n = uv = a**2 - b**2

4) Initialize a = trunc(sqrt(n)), b=0, r=a**2-b**2-n

5) Iterate looking for r==0, with an inner loop that keeps a=(u+v)/2 constant and

increases b=(u-v)/2 by 1 each iteration until r becomes negative. When this

happens, the halfsum a is increased by 1, and the difference loop is repeated.

The algorithm in Knuth uses auxiliary variables x,y for efficiency, where x =

2*a+1 and y = 2*b+1

This works fine in most cases, but causes overflow of a longword when x,y are

the full 32-bits in size. So we have augmented the algorithm to deal with this case.

This solution also uses an efficient integer sqrt algorithm due to Ron Hunsinger,

and extends that algorithm to 64 bits.

/* 4 */
#pragma options(assign_registers,honor_register)
#define ulong unsigned long
#define ushort unsigned short
#define kLo16Bits 0xFFFF
#define kHiBit 0x80000000UL
#define kLo2Bits 3
#define kLo1Bit 1
/*
Macros RightShift2 and RightShift1 shift a 64-bit value right by 2
and 1 bits, respectively.
 */
#define RightShift32By2(xL,xH)                            
{                                                         
  xL >>= 2;                                               
  xL |= (xH & kLo2Bits)<<30;                              
  xH >>= 2;                                               
}
#define RightShift32By1(xL,xH)                            
{                                                         
  xL >>= 1;                                               
  xL |= (xH & kLo1Bit)<<31;                               
  xH >>= 1;                                               
}
/*
Macros Add32To64 (Sub32From64) add (subtract) a 32-bit value to
(from) a 64-bit value.
 */
#define Add32To64(rL,rH, a)                               
  temp = rL;                                              
  if ((rL += a) < temp) ++rH;
#define Add2NPlus1To64(lowR,highR,a)                      
  Add32To64(lowR,highR,a);                                
  Add32To64(lowR,highR,a);                                
  Add32To64(lowR,highR,1);
#define Sub32From64(rL,rH, s)                             
  temp = rL;                                              
  if ((rL -= s) > temp) --rH;
#define Sub2NPlus1From64(lowR,highR,s)                    
  Sub32From64(lowR,highR,s);                              
  Sub32From64(lowR,highR,s);                              
  Sub32From64(lowR,highR,1);
//Macros Add64 (Sub64) add (subtract) two 64-bit values.
#define Add64(qL,qH, eL,eH)                               
  Add32To64(qL,qH,eL);                                    
  qH += eH;
#define Sub64(qL,qH, eL,eH)                               
  Sub32From64(qL,qH, eL);                                 
  qH -= eH;
/*
Macro Square64 multiplies a 32-bit value by itself to produce the
square as a 64-bit value.  For this solution, we only need to execute
this macro expansion once.
 */
#define Square64(a,rL,rH,temp)                            
{                                                         
  ulong lohi,lolo,hihi;                                   
  ushort aHi,aLo;                                         
  aHi = a>>16;                                            
  aLo = a;                                                

Referenced by (3):