Nov 94 Dialog Box
Volume Number: 10
Issue Number: 11
Column Tag: Dialog Box
Dialog Box
By Scott T Boyd, Editor
Don’t Scream, Send ‘em The Article!
I’d like to applaud Eric Shapiro’s article entitled “Multiple Monitors vs. Your
Application”. As a sometimes one, two, or three monitor user I routinely experience
everything he griped about. Purchasing multiple monitors is the most cost effective
solution for attaining a large on-screen workspace and developers would do well to
support it better. (Having multiple monitors has also worked in the past to really
freak out some of my IBM user/programmer friends!)
Even Apple has problems writing friendly windowing code. I can’t tell you how
many times I’ve disconnected my second monitor to use somewhere else, restarted,
launched AppleTalk Remote Access 1.0, and found one of its crucial windows to appear
offscreen. I have to reconnect the second monitor, move the windows to the main
monitor, and try again! At least when I encounter this bug in Excel I can “arrange” it
back on...
I too was aghast to see that the “new version of one popular developer tool”
creates the window and then snaps it into position. The first few times I brought up a
window I was sure that two were appearing. How could something that obvious slip
through, e specially when the fix should be a single line of code? Anyway, thanks for
the article. If I see one more window jump back to my main monitor when I try to
enlarge it on my second monitor by clicking the zoom box, I’ll scream!
- Jeff Mallett, j.mallett@genie.geis.com
PPC Assembly Article Comments
When Bill Karsh repeated last month the worn-out advice originally promoted by
the Apple folks last year (“You don’t need assembler, the compiler can do better than
handwritten assembly” or words to that effect), it hit me with particular irony. You
see, the lack of adequate compiler tools (Thanks, Apple, for your inimitable support
here) has forced me to write more assembly code for the 601 in the last couple months
than for all other computers combined over the previous decade. Anyway I read his
article with great interest. Some comments:
1. Perhaps your readers should know that the sequence,
addis r3,0,0xABCD
addi r3,0,0xEF23
is unlikely to load the hex value ABCDEF23 into r3, for two reasons. First of all,
the result of the addis instruction will be discarded by the second, since it sums ZERO
+ EF23, not the previous result. Better to use r3 as the second parameter. But it still
won’t work, because addi takes a SIGNED immediate operand, and EF23 sign-extends to
FFFFEF23, not 0000EF23, which adds -1 to the previously loaded upper half. The
correct sequence for loading ABCDEF23 into r3 is:
addis r3,0,0xABCE
addi r3,r3,0xEF23 or alternatively,
addi r3,0,0xEF23
addis r3,r3,0xABCE
or still better, because it’s more understandable (ori takes an UNsigned
operand):
addis r3,0,0xABCD
ori r3,r3,0xEF23
2. I’m not sure how Bill intends to use the -ze variants for add and subtract “as
register-to-register move or negate and move mnemonics” but he’s likely to be
surprised when he tries to do so and finds the previous contents of the carry flag
(XER.CA) randomizing his results somewhat. Better to stick to ORI for move, and
NEG for negate and move.
3. Bill tells us that “Divide operations treat rA as a 64-bit dividend...” Perhaps
somebody should tell Motorola, because their manual reports the much more
reasonable proposition that the dividend is 32 bits. If it’s 64, where do the other
32 bits come from?
4. It’s really too bad we are stuck with the IBM syntax for the rotate operators. Or I
should say YOU are stuck with it: very early on I realized I was, like Bill,
burning a lot of time on this stuff, and altered my assembler and disassembler to
reflect what is REALLY going on. All three of the rotates have very simple
semantics: they rotate the source operand left n bits, then replace some bits in
the destination with the rotated bits under the control of a mask. The remaining
bits are either zeroed or left unchanged (the fundamental difference between the
rlwimi and rlwinm). The problem is specifying the mask. See how much simpler
these two instructions are to read:
rotm r29,r27,#3,=0007FFF8 ; rotate left 3, replace indicated
bits
rotz r6,r15,#1,=00000080 ; rotate left 1, pick out a single
bit
when compared to:
rlwimi r29,r27,3,13,28
rlwinm r6,r15,1,24,24
5. The latency figures Bill gives for branch instructions are likely to be misleading
- perhaps this is why everybody makes the case for compilers being better.
Branches are free if you give them enough setup time, basically three integer
instructions after the one that altered the CR or CTR or LR register the branch
depends on, but a sequence of branches with no data dependencies has a different
kind problem. After a sequence of integer operations, the fourth consecutive
branch not taken will introduce a bubble in the pipeline, for an effective 1 cycle
delay. Consecutive branches taken cost two cycles each; they become free only if
two or more integer operations separate each pair of branches taken. Then there
are boundary conditions, but these three rules make for pretty efficient code.
6. I think Bill temporarily forgot that IBM numbers the 601 bits Big-Endian when
he illustrated the mtcrf instruction. If CRM = 0x08, then it’s cr4 (not cr3) that
is replaced with bits 16-19 (not 12-15). He got the visual image correct, but
he would be surprised when he went to use the bits by number. Another argument
for the superiority of a visual mask over bit numbers. And yes, my assembler
lets me use a visual mask syntax here as in the rotates. Perhaps somebody will
come up with a macro preprocessor for the MPW assembler to parse the bit
image syntax into something the assembler understands.
- Tom Pittman, Itty Bitty Computers
Bill Karsh responds I am grateful to Tom Pittman for scrutinizing my article
in such detail, and pleased that the readers and I will benefit from the corrections. I
agree with Tom on most of his points, but let me respond to each.
0) High Level vs. Assembly programming - To everybody (not just Tom who hates
tired dogma) maybe I was not clear enough on my personal feelings about
assembly. First of all, there is absolutely no question that just about anything
coded in (good) PPC assembly can beat the pants off the best compiler yet
available and probably ever likely to be available. I never could have intended
otherwise. In fact, I code in assembly myself, but my particular work demands
peak performance for a handful of core operations. It takes a great deal of effort
to achieve this, and one can always improve the code by small changes here and
there in a never ending process of refinement. If your particular job
specification is to speed up existing and otherwise correct code, you can do much
in C, but you can always do more in assembly by paying the price of being
absolutely tied to machine-specific code. That’s fine if you think it’s worth the
time that could be spent writing new, more portable and maintainable code. Yes,
sometimes it is worth it. The optimization should be well targeted in any case.
What I wanted to argue about compilers was that the capability is there in the
hardware to ease the compiler writer’s job of optimizing. It ought to be possible
for compilers to do better than they do today at PPC code and better at PPC code
than they have ever been at 68K code. Since there is so much for you to do just to
get your project on its feet, personally optimizing things should be a lower
priority than making them correct and meeting specs. You will gain (some)
optimization implicitly as the compilers improve, and there is some reason to be
optimistic about this happening. Don’t forget that as the machines get faster, the
need for touch-ups keeps diminishing. There will always be a place for some
killer assembly or some compiler hand-holding, but the genuine need should not
arise as often as it used to. A blanket statement about assembly or optimization
being evil would just be foolish.
1) Loading 0xABCDEF23 into a register - Tom is correct. My example is the result
of hastily copying notes from place to place and incurring typos, for which I have
no excuse. Each of his examples of loading a long literal constant is correct.
2) Clever uses of addze and subze as moves - Of course, the carry bit would have to
be cleared for the moves to work as suggested by me. Tom is right again. If
writing one’s own assembly, his are preferred methods for effecting the moves
reliably. Otherwise, the ze instructions should really only be employed for
extended arithmetic.
3) The sizes of divw rD, rA, rB operands and results - I have no argument with Tom
here either. The numerator (N), denominator (d) and quotient (Q) are all
32-bit quantities. When I said what I did about N being treated as 64-bits, I was
merely likening the division to that familiar in the 68K divs.w instruction,
where N is exactly twice the width of the d or Q registers. I intended that you
might consider N in your mind as extended in this way as a formal convenience,
not that the hardware operates this way.
4) Shift and rotate semantics - I think Tom is saying that he has created some
simplifying macros for himself, which can only be lauded. However, I was
concerned in the article with interpreting what most users are likely to see in
their standard disassembler’s output.
5) Branch timing - I agree only in spirit. There is much to say about branch timing.
I reported a latency of one cycle for branch execution which is generally true -
that’s how long a branch takes to execute (in vacuum, so to speak). This gives
little hint that a variety of things can happen depending on the context of the
branch. I take issue with Tom’s trying to characterize timing based on the
language of branches taken or not taken. Those are the rules for 68K branch
timing. On the PPC that is too simplistic. Branch timing is mainly governed by
whether branches are correctly or incorrectly predicted. Incorrectly predicted
branches hurt something awful, causing the IQ to be flushed, everything
contingently executed to be flushed and new instructions to be fetched. This can
cause a delay of more than one or two cycles. Further, the BPU handles one
branch at a time, which is why stacking them up is a no-no. The rules for
employing branching to best advantage are complicated - too much so to be
meaningfully summarized in the space of a letter.
6) mtcrf mask bits - Yep, I mistakenly reversed the bit numbering in the CRM mask
parameter. The left-most bit of CRM corresponds to the left-most CR field (cr0)
and similarly the right-most bit <-> right-most field (cr7). Whoops!
Let me elaborate on one thing that can be confusing and that occurs frequently in
code. The extsb (sign extend byte) instruction extends to a width of 32-bits, unlike the
ext.b 68K instruction which extends a byte to 16 bits. This behavior is in keeping with
the idea that PPC arithmetic instructions act in general on all the whole of a register.
If anything else is annoying or just plain wrong in the presentation, let me hear
about it.
- billKarsh@aol.com
OpenDoc, OLE, and Real Developers
I have just received Microsoft’s OLE SDK (free of charge) and been browsing it.
There is a lot of marketing (evangelizing?) stuff in it, including some “deep”
technical comparations between OLE and OpenDoc. If you program the Macintosh, your
future is OpenDoc - Apple says.
Well, Microsoft has different plans. If you program for OLE, which is available
now, and it’s free of charge, you can port your components to Windows and you can
work with Excel or Word now - B. Gates says. After much reading and studying I came
to a conclusion. I will support OpenDoc because frankly I don’t care nor like Windows
and I do vertical apps for Macs and UNIX, but if I was a mainstream developer I would
go for OLE.
There are some technical differences. According to MS, OLE is a superior
technology now and it will get better in the future. OpenDoc has several technical
merits but, alas, it’s not yet available. OpenDoc has a HUGE advantage too: It’s open,
and that means that source code is available and it’s going to be ported from PDAs to
Mainframes. Microsoft says that OLE is cross-platform (Win-Mac) now and it’s true,
and it says it will run under UNIX for free only if you licence (surprise) its Win32
API!!! And they call that Open. Please don’t make me laugh.
I would like to see some input about ISD future plans on this technologies and
some technical comparisons too.
So, take a pick, because we are going to start coding “parts” and putting them
together like chips in a computer.
- Daniel Nofal TecH S.A Buenos Aires, ARGENTINA
Dylan Takes A Load Off
Thanks for the Sept. Dylan article. I was disappointed, though, that the article
didn’t use the example code to emphasize what makes Dylan different from C++ and
other static languages. I’m not sure how many readers would wade through the code to
discover the link between the interface definition of the OpenMovieFile function:
function “OpenMovieFile”, output-argument: resRefNum;
and its invocation in the open-movie-file method:
let (err,ref-num) = OpenMovieFile(spec, file.data-permission);
nor notice some of the pleasures of Dylan they illustrate. err and ref-num didn’t
have to be declared prior to their use - Dylan figured it out from the context and
created the properly-typed objects. OpenMovieFile() is returning multiple values.
The developer didn’t have to concern herself whether arguments should be passed by
value or by reference, nor worry about the intricacies of memory management because
Dylan has automatic garbage collection.
- Steve Palmen, tshirt2b@halcyon.com