All Databases MacTech Vol 10-1994

PowerPC Assembly

Volume Number: 10

Issue Number: 9

Column Tag: PowerPC Essentials

Understanding PowerPC Assembly

Speak like a native in only two easy lessons! Welcome to lesson two.

By Bill Karsh, BillKarsh@aol.com

This article is part two of a two-part article on understanding the PowerPC

architecture and assembly language. Last month we took a brief look at the hardware

architecture of the MPC601 processor, and discussed the user programming model.

This month, we’ll summarize its assembly language syntax in a condensed and easily

digestible form for quick reference. This article is a compressed and

intelligently-filtered user manual.

If you haven’t already read last month’s article, you might want to go back and

learn about the environment and data types of the 601. If you have, let’s jump in and

learn its lingo.

601-Speak - Terms, Notation and Generalities

Let’s introduce an example statement to look at, and enumerate as much as

possible that is common to most instructions. Don’t worry. Much of this will be

revisited, and the inevitable exceptions will be pointed out, as we go.

add rD,rA,rB

This familiar operation simply adds the contents of GPRs rA and rB together, and

writes the result to rD. Immediately, there is a wealth of new stuff to talk about.

fp0...fp31, or fr0...fr31 depending on the assembler. The CR fields are cr0...cr7.

The remaining special-purpose registers are most often accessed through

special-purpose instructions, such as mtspr 1,rA (move contents of rA to

special-purpose register 1, the XER). There are usually simplified mnemonics

for these - in this example, mtxer rA.

Destination and source order - Unlike 68K assembly, the destination register

(purposely called rD) is listed first, and the source(s) second. This is true of all

instructions except stores (from register to memory), where the memory

destination is now on the right and source register on the left.

Destination flexibility - Unlike the 68K, where an add would be written ADD D0,D1

with D1 being both a source and the destination, the destination on the 601 can be

separately specified. However, one can write add rA,rA,rB if desired.

(rA|0) - In many cases (you will know when it makes sense from the context) a zero

can be substituted for rA. Here, add rD,0,rB simply copies rB into rD.

No size extensions (.L, .W, .B) - Operations are normally performed on all 32 bits

of a register unless otherwise noted. The principal exceptions are bit-field

operations, in which one specifies bit ranges, and load/stores, where the operand

size is part of the mnemonic, such as lbz (load byte into register and zero bits

0-23).

Condition Register updating is optional - On the 68K, the majority of arithmetic and

move instructions implicitly update the 68K CCR (condition code register). Not

so on the 601. To save work, the 601’s CR and XER bits are not updated unless

specifically requested. This is done through a rather large set of mnemonics,

making the instruction set look more formidable than it is. For example, the

following all perform the same add operation, but with various status updates:

addc rD,rA,rB ;update XER(CA)

addco rD,rA,rB ;update XER(CA,OV,SO)

add. rD,rA,rB ;update cr0

These mnemonic variations set additional bits in the encoding of the basic

instruction. Note that load/stores do not offer CR update versions (a final “dot”). That

requires a separate compare - remember, the reduced (and specialized) instruction

set theme.

The “dot” option is very common and always means the same thing: update cr0.

We will not repeat this every time a new instruction group is introduced, but maybe

once more to be kind.

Local Addressing modes - Load/Stores provide the only interaction with memory. We

will discuss memory addressing modes later. Otherwise, there are only two

modes available for register-based operations. Our add example demonstrated

this:

addi rD,rA,SIMM or addis rD,rA,SIMM

The (i) or (is) suffix denotes immediate or immediate shifted mode,

respectively. In the manual, SIMM and UIMM denote signed and unsigned immediate

16-bit values, respectively. They are encoded directly into the instructions, which

are 32 bits in length. Note that instructions are complete entities, having no extension

words. There isn’t room for anything larger than 16-bit immediates to be encoded.

That’s the main use of immediate shifted mode. For example, the 32-bit immediate

0xABCDEF23 might be built in r3 by the sequence:

addis r3,0,0xABCD ;load upper half-word, and zero lower half-word

addi r3,0,0xEF23 ;load lower half-word

Notation

We’ve already met with several variations of the add instruction. It’s time to

introduce some lossless compression, that is, simplifying notation. This will require

two types of custom brackets for listing options. Some readers will love this scheme

right away. Others will find it to be like ordering a combination plate from a Chinese

menu. As familiarity grows, it makes for a much quicker-to-use reference for

everybody. Pages can be reduced to lines. Let’s try it.

< > denotes none, or any one from the list.

[ ] denotes none, or any number of options, in the order listed.

Using brackets, all 24 add variations are correctly summarized as follows:

addi < s, c, c. >

add < c, e, me, ze > [ o, . ]

To practice interpreting this, addi can generate four instructions: addi, addis,

addic or addic. - the last ends in “c-dot.” The second line first generates the five

possibilities: add, addc, adde, addme or addze. Further, each one of the five has four

versions. For example: addze, addzeo, addze., addzeo. - again, the last ends in “o-dot.”

You only need to understand “c” once, not six times! We’ll cover what these

instructions do when we get to the arithmetic section.

Latency

The issue of timing and scheduling is complicated, but we’ll include the latency

with the instruction descriptions (for your edification) in spite of the fact that it tells

only part of the timing story. Latency sometimes refers to the total processing time for

an instruction. Since most stages (except execution) take one cycle, we’ll take latency

to mean execution latency. The vast majority of instructions execute in one cycle,

although a handful do differ.

Memory Addressing Modes

To do anything useful, we have to get data into and out of memory. Memory

accesses are performed with the load and store instructions, which offer two

addressing formats. Again using the lbz instruction as our example, we have:

lbz rD,d(rA|0) ;register indirect with offset

lbzx rD,(rA|0),rB ;register indirect with index

The (optional) d signifies a 16-bit offset in bytes, encoded in the instruction

(sign extended to 32 bits before use). Quite simply, the effective address of the source

is the sum rA + d, where rA can be substituted with zero. The destination register for

the load is rD, of course. If you thrill to semantics, when d = 0, this is really register

indirect mode. With rA zero, addressing like d(0) can be called absolute mode. Register

indirect with index is also simple to understand - effective address = rA + rB. Finally,

there is the update option, which can be used with either mode, and works as follows. If

rA is neither zero, nor the same register as rD, then after the load or store, the

effective address is written to rA. That’s all there is to it.

By the way, I loathe semantics and all unnecessary names for things. I still

remember being confounded by a DEC VAX manual that had this description of their

multiple-precision subtraction library routine, called something like SubX(A,B,C).

“Difference C is derived from subtrahend A and minuend B.” An extensive poll proved

that neither I nor nearly 100 other scientists at the lab where I was working had any

clue which of A and B was which - not one of us! Without these precise and expressive

terms, DEC would have been unduly forced to write C = B - A. How crass indeed!

Armed with this general knowledge, we can begin looking at instruction

groupings.

Load (Memory to Destination Register rD)

Latency 2

Operand Size Basic Options Operands

unsigned byte lbz [ u, x ] rD,d(rA)

unsigned half-word lhz [ u, x ] rD,d(rA)

signed half-word lha [ u, x ] rD,d(rA)

unsigned word lwz [ u, x ] rD,d(rA)

Options:

z “Load and zero,” right justify operand at low end of register, zero all higher

bits, i.e., treat as unsigned. z is mandatory for byte and word loads.

a “Algebraic,” load with sign extension to 32 bits.

u “Update,” if (rA != 0 && rA != rD) rA = effective address (after load).

x “Index,” use register indirect w/index addressing, i.e., lbzx rD,rA,rB.

General Notes:

- The default addressing mode is register indirect w/offset, unless x option is

specified, i.e., lbz rD,d(rA).

- The a and u forms may have greater latency on the 604.

- li, lis (load immediate) (see notes for ‘Addition’ section)

- la (load address) (see notes for ‘Addition’ section)

Store (Source Register rS to Memory)

Latency 1

Operand Size Basic Options Operands

byte stb [ u, x ] rS,d(rA)

half-word sth [ u, x ] rS,d(rA)

word stw [ u, x ] rS,d(rA)

Options:

u “Update,” if (rA != 0) rA = effective address (after store). Unlike loads, it is

permissible to set rA = rS.

x “Index,” use register indirect w/index addressing, i.e., stbx rS,rA,rB.

General Notes:

- The default addressing mode is register indirect w/offset, unless x option is

specified, i.e., stb rS,d(rA).

- Operand ordering is different from normal. The Source is given first.

Addition (negate, +, -)

Latency 1

Result Basic ÛOptions Operands

rD = -rA neg Û[ o, . ] rD,rA

rD = rA + SIMM addi < s, c, c. > none rD,rA,SIMM

rD = rA + rB add < c, e > [ o, . ] rD,rA,rB

rD = rA - 1 addme Û[ o, . ] rD,rA

rD = rA + 0 addze Û[ o, . ] rD,rA

rD = SIMM - rA subfic Ûnone rD,rA,SIMM

rD = rB - rA subf < c, e > [ o, . ] rD,rA,rB

rD = -1 - rA subfme Û[ o, . ] rD,rA

rD = 0 - rA subfze Û[ o, . ] rD,rA

Options:

i “Immediate operand,” specified as the 16-bit value SIMM.

s “Shifted,” the given 16-bit immediate is left-shifted 16-bits to become the

high half of a word. The low half is zero.

c “Carry,” update XER(CA) bit.

Referenced by (6):