All Databases MacTech Vol 10-1994

Nov 94 Dialog Box

Volume Number: 10

Issue Number: 11

Column Tag: Dialog Box

Dialog Box

By Scott T Boyd, Editor

Don’t Scream, Send ‘em The Article!

I’d like to applaud Eric Shapiro’s article entitled “Multiple Monitors vs. Your

Application”. As a sometimes one, two, or three monitor user I routinely experience

everything he griped about. Purchasing multiple monitors is the most cost effective

solution for attaining a large on-screen workspace and developers would do well to

support it better. (Having multiple monitors has also worked in the past to really

freak out some of my IBM user/programmer friends!)

Even Apple has problems writing friendly windowing code. I can’t tell you how

many times I’ve disconnected my second monitor to use somewhere else, restarted,

launched AppleTalk Remote Access 1.0, and found one of its crucial windows to appear

offscreen. I have to reconnect the second monitor, move the windows to the main

monitor, and try again! At least when I encounter this bug in Excel I can “arrange” it

back on...

I too was aghast to see that the “new version of one popular developer tool”

creates the window and then snaps it into position. The first few times I brought up a

window I was sure that two were appearing. How could something that obvious slip

through, e specially when the fix should be a single line of code? Anyway, thanks for

the article. If I see one more window jump back to my main monitor when I try to

enlarge it on my second monitor by clicking the zoom box, I’ll scream!

- Jeff Mallett, j.mallett@genie.geis.com

PPC Assembly Article Comments

When Bill Karsh repeated last month the worn-out advice originally promoted by

the Apple folks last year (“You don’t need assembler, the compiler can do better than

handwritten assembly” or words to that effect), it hit me with particular irony. You

see, the lack of adequate compiler tools (Thanks, Apple, for your inimitable support

here) has forced me to write more assembly code for the 601 in the last couple months

than for all other computers combined over the previous decade. Anyway I read his

article with great interest. Some comments:

1. Perhaps your readers should know that the sequence,

addis r3,0,0xABCD

addi r3,0,0xEF23

is unlikely to load the hex value ABCDEF23 into r3, for two reasons. First of all,

the result of the addis instruction will be discarded by the second, since it sums ZERO

+ EF23, not the previous result. Better to use r3 as the second parameter. But it still

won’t work, because addi takes a SIGNED immediate operand, and EF23 sign-extends to

FFFFEF23, not 0000EF23, which adds -1 to the previously loaded upper half. The

correct sequence for loading ABCDEF23 into r3 is:

addis r3,0,0xABCE

addi r3,r3,0xEF23 or alternatively,

addi r3,0,0xEF23

addis r3,r3,0xABCE

or still better, because it’s more understandable (ori takes an UNsigned

operand):

addis r3,0,0xABCD

ori r3,r3,0xEF23

2. I’m not sure how Bill intends to use the -ze variants for add and subtract “as

surprised when he tries to do so and finds the previous contents of the carry flag

(XER.CA) randomizing his results somewhat. Better to stick to ORI for move, and

NEG for negate and move.

3. Bill tells us that “Divide operations treat rA as a 64-bit dividend...” Perhaps

somebody should tell Motorola, because their manual reports the much more

reasonable proposition that the dividend is 32 bits. If it’s 64, where do the other

32 bits come from?

4. It’s really too bad we are stuck with the IBM syntax for the rotate operators. Or I

should say YOU are stuck with it: very early on I realized I was, like Bill,

burning a lot of time on this stuff, and altered my assembler and disassembler to

reflect what is REALLY going on. All three of the rotates have very simple

semantics: they rotate the source operand left n bits, then replace some bits in

the destination with the rotated bits under the control of a mask. The remaining

bits are either zeroed or left unchanged (the fundamental difference between the

rlwimi and rlwinm). The problem is specifying the mask. See how much simpler

these two instructions are to read:

rotm r29,r27,#3,=0007FFF8 ; rotate left 3, replace indicated

bits

rotz r6,r15,#1,=00000080 ; rotate left 1, pick out a single

bit

when compared to:

rlwimi r29,r27,3,13,28

rlwinm r6,r15,1,24,24

5. The latency figures Bill gives for branch instructions are likely to be misleading

- perhaps this is why everybody makes the case for compilers being better.

Branches are free if you give them enough setup time, basically three integer

instructions after the one that altered the CR or CTR or LR register the branch

depends on, but a sequence of branches with no data dependencies has a different

kind problem. After a sequence of integer operations, the fourth consecutive

branch not taken will introduce a bubble in the pipeline, for an effective 1 cycle

delay. Consecutive branches taken cost two cycles each; they become free only if

two or more integer operations separate each pair of branches taken. Then there

are boundary conditions, but these three rules make for pretty efficient code.

6. I think Bill temporarily forgot that IBM numbers the 601 bits Big-Endian when

he illustrated the mtcrf instruction. If CRM = 0x08, then it’s cr4 (not cr3) that

is replaced with bits 16-19 (not 12-15). He got the visual image correct, but

he would be surprised when he went to use the bits by number. Another argument

for the superiority of a visual mask over bit numbers. And yes, my assembler

lets me use a visual mask syntax here as in the rotates. Perhaps somebody will

come up with a macro preprocessor for the MPW assembler to parse the bit

image syntax into something the assembler understands.

- Tom Pittman, Itty Bitty Computers

Bill Karsh responds I am grateful to Tom Pittman for scrutinizing my article

in such detail, and pleased that the readers and I will benefit from the corrections. I

agree with Tom on most of his points, but let me respond to each.

0) High Level vs. Assembly programming - To everybody (not just Tom who hates

tired dogma) maybe I was not clear enough on my personal feelings about

assembly. First of all, there is absolutely no question that just about anything

coded in (good) PPC assembly can beat the pants off the best compiler yet

available and probably ever likely to be available. I never could have intended

otherwise. In fact, I code in assembly myself, but my particular work demands

peak performance for a handful of core operations. It takes a great deal of effort

to achieve this, and one can always improve the code by small changes here and

there in a never ending process of refinement. If your particular job

specification is to speed up existing and otherwise correct code, you can do much

in C, but you can always do more in assembly by paying the price of being

absolutely tied to machine-specific code. That’s fine if you think it’s worth the

time that could be spent writing new, more portable and maintainable code. Yes,

sometimes it is worth it. The optimization should be well targeted in any case.

What I wanted to argue about compilers was that the capability is there in the

hardware to ease the compiler writer’s job of optimizing. It ought to be possible

for compilers to do better than they do today at PPC code and better at PPC code

than they have ever been at 68K code. Since there is so much for you to do just to

get your project on its feet, personally optimizing things should be a lower

priority than making them correct and meeting specs. You will gain (some)

optimization implicitly as the compilers improve, and there is some reason to be

optimistic about this happening. Don’t forget that as the machines get faster, the

need for touch-ups keeps diminishing. There will always be a place for some

killer assembly or some compiler hand-holding, but the genuine need should not

arise as often as it used to. A blanket statement about assembly or optimization

being evil would just be foolish.

1) Loading 0xABCDEF23 into a register - Tom is correct. My example is the result

of hastily copying notes from place to place and incurring typos, for which I have

no excuse. Each of his examples of loading a long literal constant is correct.

2) Clever uses of addze and subze as moves - Of course, the carry bit would have to

be cleared for the moves to work as suggested by me. Tom is right again. If

writing one’s own assembly, his are preferred methods for effecting the moves

reliably. Otherwise, the ze instructions should really only be employed for

extended arithmetic.

3) The sizes of divw rD, rA, rB operands and results - I have no argument with Tom

here either. The numerator (N), denominator (d) and quotient (Q) are all

32-bit quantities. When I said what I did about N being treated as 64-bits, I was

merely likening the division to that familiar in the 68K divs.w instruction,

where N is exactly twice the width of the d or Q registers. I intended that you

might consider N in your mind as extended in this way as a formal convenience,

not that the hardware operates this way.

4) Shift and rotate semantics - I think Tom is saying that he has created some

simplifying macros for himself, which can only be lauded. However, I was

concerned in the article with interpreting what most users are likely to see in

their standard disassembler’s output.

5) Branch timing - I agree only in spirit. There is much to say about branch timing.

I reported a latency of one cycle for branch execution which is generally true -

that’s how long a branch takes to execute (in vacuum, so to speak). This gives

little hint that a variety of things can happen depending on the context of the

branch. I take issue with Tom’s trying to characterize timing based on the

language of branches taken or not taken. Those are the rules for 68K branch

timing. On the PPC that is too simplistic. Branch timing is mainly governed by

whether branches are correctly or incorrectly predicted. Incorrectly predicted

branches hurt something awful, causing the IQ to be flushed, everything

contingently executed to be flushed and new instructions to be fetched. This can

cause a delay of more than one or two cycles. Further, the BPU handles one

branch at a time, which is why stacking them up is a no-no. The rules for

employing branching to best advantage are complicated - too much so to be

meaningfully summarized in the space of a letter.

6) mtcrf mask bits - Yep, I mistakenly reversed the bit numbering in the CRM mask

parameter. The left-most bit of CRM corresponds to the left-most CR field (cr0)

and similarly the right-most bit <-> right-most field (cr7). Whoops!

Let me elaborate on one thing that can be confusing and that occurs frequently in

code. The extsb (sign extend byte) instruction extends to a width of 32-bits, unlike the

ext.b 68K instruction which extends a byte to 16 bits. This behavior is in keeping with

the idea that PPC arithmetic instructions act in general on all the whole of a register.

If anything else is annoying or just plain wrong in the presentation, let me hear

about it.

- billKarsh@aol.com

OpenDoc, OLE, and Real Developers

I have just received Microsoft’s OLE SDK (free of charge) and been browsing it.

There is a lot of marketing (evangelizing?) stuff in it, including some “deep”

technical comparations between OLE and OpenDoc. If you program the Macintosh, your

future is OpenDoc - Apple says.

Well, Microsoft has different plans. If you program for OLE, which is available

now, and it’s free of charge, you can port your components to Windows and you can

work with Excel or Word now - B. Gates says. After much reading and studying I came

to a conclusion. I will support OpenDoc because frankly I don’t care nor like Windows

and I do vertical apps for Macs and UNIX, but if I was a mainstream developer I would

go for OLE.

There are some technical differences. According to MS, OLE is a superior

technology now and it will get better in the future. OpenDoc has several technical

merits but, alas, it’s not yet available. OpenDoc has a HUGE advantage too: It’s open,

and that means that source code is available and it’s going to be ported from PDAs to

Mainframes. Microsoft says that OLE is cross-platform (Win-Mac) now and it’s true,

and it says it will run under UNIX for free only if you licence (surprise) its Win32

API!!! And they call that Open. Please don’t make me laugh.

I would like to see some input about ISD future plans on this technologies and

some technical comparisons too.

So, take a pick, because we are going to start coding “parts” and putting them

together like chips in a computer.

- Daniel Nofal TecH S.A Buenos Aires, ARGENTINA

Dylan Takes A Load Off

Thanks for the Sept. Dylan article. I was disappointed, though, that the article

didn’t use the example code to emphasize what makes Dylan different from C++ and

other static languages. I’m not sure how many readers would wade through the code to

discover the link between the interface definition of the OpenMovieFile function:

function “OpenMovieFile”, output-argument: resRefNum;

and its invocation in the open-movie-file method:

let (err,ref-num) = OpenMovieFile(spec, file.data-permission);

nor notice some of the pleasures of Dylan they illustrate. err and ref-num didn’t

have to be declared prior to their use - Dylan figured it out from the context and

created the properly-typed objects. OpenMovieFile() is returning multiple values.

The developer didn’t have to concern herself whether arguments should be passed by

value or by reference, nor worry about the intricacies of memory management because

Dylan has automatic garbage collection.

- Steve Palmen, tshirt2b@halcyon.com

Referenced by (3):