All Databases MacTech Vol 04-1988

68881 Access

Volume Number: 4

Issue Number: 4

Column Tag: Forth Forum

Direct 68881 Floating Point Access

By Jörg Langowski, MacTutor Editorial Board, Grenoble, France

Being the proud owner of a MacII since the beginning of this year, I was - like so

many others - disappointed at the relative slowness of the SANE package. Even with the

floating point coprocessor, the speedup is only a factor of 5-10 compared to the Mac

SE. The coprocessor itself allows you to go much faster than that, and it is Apple’s

SANE implementation that slows down the operations.

The reason for this is, of course, that Apple is trying to guarantee the

‘compatibility’ of the SANE on the MacII with the old SANE implementation. That means

that the results are supposed to be the same, down to the least significant bit (or even

to the last guard bit?). The 68881 uses different algorithms for calculating

transcendental functions than SANE does, therefore some of its built-in operations

could not be used and had to be replaced by software. Of course, that slows down things

a lot.

As a side note, I’m not at all in favor of such a strategy. Artificially restraining a

high performance chip just because its results, although accurate enough, don’t match

the old - also accurate - results to the last bit seems a little exaggerated. Stable

numerical algorithms should take into account the possibility that the hardware

changes slightly and the machine errors are different, and they should be immune

against such changes; or at least the new SANE should have had an option built in that

uses the 68881 directly! [Amen to that statement! -Ed]

Many of the development systems for the Macintosh now come with the

possibility to generate code that supports the floating point coprocessor directly, many

others don’t; working with Forth, we don’t have a real problem since we can easily

redefine our floating point operators. This month I’ll show you how to do that.

The 68881

Let’s first have a look at the floating point coprocessor itself.

The 68881 is accessed from the 68020 in a special address area. When the

68020 encounters an instruction of the form $Fxxx (previously the F line trap), it

will set its function code lines (pins FC0-FC2) all to high, indicating ‘CPU space’. It

will then exchange information with the coprocessor’s internal registers to perform

the floating point operation requested.

This information exchange occurs automatically, it is part of the 68020’s

design; when using the floating point instructions, you don’t notice the communication

between the two processors at all. The 68881 appears as an extension to the 68020,

just as if we had a new set of registers available, with special instructions operating

on them.

The registers FP0-FP7 can each hold one extended precision (80-bit) floating

point number, in the format given by the IEEE standard: bits 0-63 contain the

mantissa, 64-78 the 15-bit exponent offset by 16383, and bit 79 the sign. The

instruction set of the 68881 allows you to do floating point operations on any or

between any two of these registers, and transfer data between them, the 68020’s

registers and memory.

Mach2’s assembler fully supports all 68881 instructions. The redefinition of

the words in the SANE vocabulary is therefore quite straightforward. Register D7 is

used by Mach2 as the floating point stack pointer, all we have to do is to transfer

floating point numbers from the FP stack to the 68881’s registers, do the FP

operation and transfer back the result.

The 68881 expects floating point numbers in a different format than SANE does.

In order to keep FP numbers aligned to the boundaries of a 32-bit long word, which

speeds up operations, an extended precision number in memory will be 96 bits long

instead of 80; there is a 16 bit gap between the 64-bit mantissa and the 16-bit sign

and exponent field. Since SANE and the floating point stack in Mach2 use 80 bit

numbers, we have to convert their format when we use the coprocessor.

Listing 1 provides the macros f>2, f>1 and 1>f for this purpose. The general

format for a binary floating point operation is then

f>2

fop.x FPn, FPm

1>f

where fop.x is an extended format floating point instruction that operates on floating

point registers FPn and FPm. The example always uses registers FP0 and FP1. Unary

operations are encoded the same way, using only FP0.

In the example, I provided a new vocabulary f68881 that contains redefinitions

of the most important SANE operations for the 68881. Of course, the concept can be

far extended. We have eight registers at our disposition and can use them to optimize

more complicated numerical algorithms, using only assembly language. I might give

some examples how to do this in a later column.

For the moment, let’s content ourselves with the speed improvement that we have

achieved so far, which is already remarkable. Some simple benchmarks are listed at

the end of listing 1, and the results are given here:

( MacII, direct 68881 access, 100000 loops each )

bmark1 2 Secs 18 Ticks ok <0> [0]

bmark2 4 Secs 48 Ticks ok <0> [0]

bmark3 4 Secs 14 Ticks ok <0> [0]

bmark4 4 Secs 17 Ticks ok <0> [0]

bmark5 4 Secs 14 Ticks ok <0> [0]

( MacII, SANE w/68881, 10000 loops each )

smark1 15 Ticks ok <0> [0]

smark2 37 Secs 12 Ticks ok <0> [0]

smark3 24 Ticks ok <0> [0]

smark4 1 Secs 33 Ticks ok <0> [0]

smark5 1 Secs 33 Ticks ok <0> [0]

( Mac +, SANE, 10000 loops each )

ok <0> [0]

smark1 24 Ticks ok <0> [0]

smark2 181 Secs 54 Ticks ok <0> [0]

smark3 49 Ticks ok <0> [0]

smark4 4 Secs 47 Ticks ok <0> [0]

smark5 5 Secs 14 Ticks ok <0> [0]

For the Mac+, the fnull1 and fnull2 operations had been replaced by simple

fdrops. As you see, the speedup going from Mac+ to MacII’s SANE is not so

breathtaking: a factor of 6 for the exponential, 4 for addition and subtraction; but

when we access the 68881 directly, we gain another factor of 3 for simple addition and

multiplication and 78 for the exponential. It is in the calculation of the transcendentals

where the 68881 really shines.

Pop up menus

Someone approached me lately on the question of how to do pop up menu selection.

Since the technical notes contain only sketchy references to popup menus at the time I

write this (e.g. TN156, TN172), I’d like to give you a practical example how to use

pop up menus from Mach2 in a simple way.

Listing 2 explains the process. The PopUpMenuSelect trap takes four parameters:

- a handle to the menu to be displayed (32 bits),

- the top and left global coordinate of the point at which to display the menu (2*16

bits),

- the menu item which should be positioned at that point for the default selection.

Although Mach2 knows the trap name, the interface to this routine is not (yet)

correct, so we have to redefine it in assembler. Note that the point returned by the

@mouse function is in local coordinates, while PopUpMenuSelect expects it in global

coordinates.

The example defines a menu using the Mach2 interface; the menu is created with

-1 as the insertion parameter (-1 150 mymenu BOUNDS) so that after insertion into

the tasks’s menu bar the menu will stay invisible (just as we did for the hierarchical

menus). Note that a pop up menu has to be inserted into the menu bar before using it.

The content handler for the default Mach2 window is then rewritten so that on a

mouse down event it will select the example pop up menu. The menu handler will just

beep a number of times depending on the item selected. dopop activates the new content

handler while nopop deactivates it.

Feedback dept.

This letter comes from Vassili Dzuba, Paris:

“In January’s issue ‘Mousehole Report’, Alan Dall put in his wish list the ability

to define ‘ghost copies’ of applications. Even without Unix’ capability of defining links,

this can be done with a small program using the _Launch trap. This program takes only

1K on the disk. The path name of the application to launch is stored in ‘STR ‘ resource

1000. It’s possible to set the creator and the bundle bit of the ghost to have it share the

same icon as the real thing. Of course, double-clicking on a document can then launch

the ghost instead of the application, but the slowing down is only marginal.

The program is the following

(using MPW’s assembler):

INCLUDE ‘traps.a’

ghost MAIN

MOVE.W #0,-(SP)

; the context data (_launch parameter)

MOVE.L #’STR ‘,-(SP)

; 1st parameter of GetResource

MOVE.W #1000,-(SP)

; 2nd parameter of GetResource

_GetResource

; handle to string in (SP)

TST.L (SP)

; test if null handle (no resource available)

BEQ.B exit

; if null, go to exit

MOVE.L (SP),A0

MOVE.L (A0),(SP)

; handle dereferenced

MOVE.L SP,A0

; stack pointer in A0

_Launch

exit _ExitToShell

END

The resource file is something like this (using Rez format):

#include “types.r”
resource ‘STR ‘ (1000) {

“Sys:myDirectory:MyApp”

};

A ghost can be easily created using a small shell script (assuming the original

ghost’s directory being {MPW}dev) which sets up the string  resource:
duplicate {MPW}’dev:ghost’ {2}

echo ‘#include “types.r”’∂n∂

‘resource ‘∂’’STR ‘∂’’ (1000) { “‘{1}’” };’∂
| rez -a -o {2}

Assuming this script is named ‘summon’, the creation in the current directory of

a ghost of MacPaint would be something like:

summon ‘sys:appli :mac paint’ ‘macpaint.Ghost’

Sincerely yours”

Thank you, Vassili, for this helpful little utility.

Listing 1: direct access 68881 floating point words for Mach2
\ 68881 access, © J. Langowski/MacTutor Jan 1988
only forth also assembler also sane
vocabulary f68881
also f68881 definitions
code f>2
add.l #20,d7
move.l d7,a0
move.l -(a0),-(a7) ; move mantissa
move.l -(a0),-(a7) ; in two 32-bit chunks
subq.l #2,a7 ; 16-bit gap
move.w -(a0),-(a7) ; move exponent + sign
fmove.x (a7)+,fp0 ; transfer from stack to fp0
move.l -(a0),-(a7) ; same for fp1...
move.l -(a0),-(a7)
subq.l #2,a7
move.w -(a0),-(a7)
fmove.x (a7)+,fp1
add.l #10,a0 ; a0 points to 2nd fp stack item
rts
end-code mach
code f>1
add.l #10,d7
move.l d7,a0
move.l -(a0),-(a7)
move.l -(a0),-(a7)
subq.l #2,a7
move.w -(a0),-(a7)
fmove.x (a7)+,fp0
rts
end-code mach
code 1>f
fmove.x fp0,-(a7)
move.w (a7)+,(a0)+ ; transfer exponent + sign
addq.l #2,a7 ; skip16 bit gap
move.l (a7)+,(a0)+ ; transfer mantissa
move.l (a7)+,(a0)+ ; in 2 steps
sub.l  #10,d7 ; adjust FP stack pointer
rts
end-code mach
( note: f>1 or f>2 and 1>f should always occur in pairs since 1>f
expects A0 to point to second floating point stack position; this is
assured by f>1 and f>2 )
code f+
f>2
fadd.x  fp1,fp0
1>f
rts
end-code
code f-
f>2
fsub.x  fp1,fp0
1>f
rts
end-code
code f/
f>2
fdiv.x  fp1,fp0
1>f
rts
end-code
code f*
f>2
fmul.x  fp1,fp0
1>f
rts
end-code
code fmod
f>2
fmod.x fp1,fp0
1>f
rts
end-code
code frem
f>2
frem.x  fp1,fp0
1>f
rts
end-code
code fabs
f>1
fabs.x fp0
1>f

Referenced by (4):