PowerPC Assembly
Volume Number: 10
Issue Number: 9
Column Tag: PowerPC Essentials
Understanding PowerPC Assembly
Speak like a native in only two easy lessons! Welcome to lesson two.
By Bill Karsh, BillKarsh@aol.com
This article is part two of a two-part article on understanding the PowerPC
architecture and assembly language. Last month we took a brief look at the hardware
architecture of the MPC601 processor, and discussed the user programming model.
This month, we’ll summarize its assembly language syntax in a condensed and easily
digestible form for quick reference. This article is a compressed and
intelligently-filtered user manual.
If you haven’t already read last month’s article, you might want to go back and
learn about the environment and data types of the 601. If you have, let’s jump in and
learn its lingo.
601-Speak - Terms, Notation and Generalities
Let’s introduce an example statement to look at, and enumerate as much as
possible that is common to most instructions. Don’t worry. Much of this will be
revisited, and the inevitable exceptions will be pointed out, as we go.
add rD,rA,rB
This familiar operation simply adds the contents of GPRs rA and rB together, and
writes the result to rD. Immediately, there is a wealth of new stuff to talk about.
Register names - GPRs appear in assembly listings as r0...r31. FPRs are written
fp0...fp31, or fr0...fr31 depending on the assembler. The CR fields are cr0...cr7.
The remaining special-purpose registers are most often accessed through
special-purpose instructions, such as mtspr 1,rA (move contents of rA to
special-purpose register 1, the XER). There are usually simplified mnemonics
for these - in this example, mtxer rA.
Destination and source order - Unlike 68K assembly, the destination register
(purposely called rD) is listed first, and the source(s) second. This is true of all
instructions except stores (from register to memory), where the memory
destination is now on the right and source register on the left.
Destination flexibility - Unlike the 68K, where an add would be written ADD D0,D1
with D1 being both a source and the destination, the destination on the 601 can be
separately specified. However, one can write add rA,rA,rB if desired.
(rA|0) - In many cases (you will know when it makes sense from the context) a zero
can be substituted for rA. Here, add rD,0,rB simply copies rB into rD.
No size extensions (.L, .W, .B) - Operations are normally performed on all 32 bits
of a register unless otherwise noted. The principal exceptions are bit-field
operations, in which one specifies bit ranges, and load/stores, where the operand
size is part of the mnemonic, such as lbz (load byte into register and zero bits
0-23).
Condition Register updating is optional - On the 68K, the majority of arithmetic and
move instructions implicitly update the 68K CCR (condition code register). Not
so on the 601. To save work, the 601’s CR and XER bits are not updated unless
specifically requested. This is done through a rather large set of mnemonics,
making the instruction set look more formidable than it is. For example, the
following all perform the same add operation, but with various status updates:
addc rD,rA,rB ;update XER(CA)
addco rD,rA,rB ;update XER(CA,OV,SO)
add. rD,rA,rB ;update cr0
These mnemonic variations set additional bits in the encoding of the basic
instruction. Note that load/stores do not offer CR update versions (a final “dot”). That
requires a separate compare - remember, the reduced (and specialized) instruction
set theme.
The “dot” option is very common and always means the same thing: update cr0.
We will not repeat this every time a new instruction group is introduced, but maybe
once more to be kind.
Local Addressing modes - Load/Stores provide the only interaction with memory. We
will discuss memory addressing modes later. Otherwise, there are only two
modes available for register-based operations. Our add example demonstrated
register direct mode. There is also immediate half-word mode, which looks like
this:
addi rD,rA,SIMM or addis rD,rA,SIMM
The (i) or (is) suffix denotes immediate or immediate shifted mode,
respectively. In the manual, SIMM and UIMM denote signed and unsigned immediate
16-bit values, respectively. They are encoded directly into the instructions, which
are 32 bits in length. Note that instructions are complete entities, having no extension
words. There isn’t room for anything larger than 16-bit immediates to be encoded.
That’s the main use of immediate shifted mode. For example, the 32-bit immediate
0xABCDEF23 might be built in r3 by the sequence:
addis r3,0,0xABCD ;load upper half-word, and zero lower half-word
addi r3,0,0xEF23 ;load lower half-word
Notation
We’ve already met with several variations of the add instruction. It’s time to
introduce some lossless compression, that is, simplifying notation. This will require
two types of custom brackets for listing options. Some readers will love this scheme
right away. Others will find it to be like ordering a combination plate from a Chinese
menu. As familiarity grows, it makes for a much quicker-to-use reference for
everybody. Pages can be reduced to lines. Let’s try it.
< > denotes none, or any one from the list.
[ ] denotes none, or any number of options, in the order listed.
Using brackets, all 24 add variations are correctly summarized as follows:
addi < s, c, c. >
add < c, e, me, ze > [ o, . ]
To practice interpreting this, addi can generate four instructions: addi, addis,
addic or addic. - the last ends in “c-dot.” The second line first generates the five
possibilities: add, addc, adde, addme or addze. Further, each one of the five has four
versions. For example: addze, addzeo, addze., addzeo. - again, the last ends in “o-dot.”
You only need to understand “c” once, not six times! We’ll cover what these
instructions do when we get to the arithmetic section.
Latency
The issue of timing and scheduling is complicated, but we’ll include the latency
with the instruction descriptions (for your edification) in spite of the fact that it tells
only part of the timing story. Latency sometimes refers to the total processing time for
an instruction. Since most stages (except execution) take one cycle, we’ll take latency
to mean execution latency. The vast majority of instructions execute in one cycle,
although a handful do differ.
Memory Addressing Modes
To do anything useful, we have to get data into and out of memory. Memory
accesses are performed with the load and store instructions, which offer two
addressing formats. Again using the lbz instruction as our example, we have:
lbz rD,d(rA|0) ;register indirect with offset
lbzx rD,(rA|0),rB ;register indirect with index
The (optional) d signifies a 16-bit offset in bytes, encoded in the instruction
(sign extended to 32 bits before use). Quite simply, the effective address of the source
is the sum rA + d, where rA can be substituted with zero. The destination register for
the load is rD, of course. If you thrill to semantics, when d = 0, this is really register
indirect mode. With rA zero, addressing like d(0) can be called absolute mode. Register
indirect with index is also simple to understand - effective address = rA + rB. Finally,
there is the update option, which can be used with either mode, and works as follows. If
rA is neither zero, nor the same register as rD, then after the load or store, the
effective address is written to rA. That’s all there is to it.
By the way, I loathe semantics and all unnecessary names for things. I still
remember being confounded by a DEC VAX manual that had this description of their
multiple-precision subtraction library routine, called something like SubX(A,B,C).
“Difference C is derived from subtrahend A and minuend B.” An extensive poll proved
that neither I nor nearly 100 other scientists at the lab where I was working had any
clue which of A and B was which - not one of us! Without these precise and expressive
terms, DEC would have been unduly forced to write C = B - A. How crass indeed!
Armed with this general knowledge, we can begin looking at instruction
groupings.
Load (Memory to Destination Register rD)
Latency 2
Operand Size Basic Options Operands
unsigned byte lbz [ u, x ] rD,d(rA)
unsigned half-word lhz [ u, x ] rD,d(rA)
signed half-word lha [ u, x ] rD,d(rA)
unsigned word lwz [ u, x ] rD,d(rA)
Options:
z “Load and zero,” right justify operand at low end of register, zero all higher
bits, i.e., treat as unsigned. z is mandatory for byte and word loads.
a “Algebraic,” load with sign extension to 32 bits.
u “Update,” if (rA != 0 && rA != rD) rA = effective address (after load).
x “Index,” use register indirect w/index addressing, i.e., lbzx rD,rA,rB.
General Notes:
- The default addressing mode is register indirect w/offset, unless x option is
specified, i.e., lbz rD,d(rA).
- The a and u forms may have greater latency on the 604.
- li, lis (load immediate) (see notes for ‘Addition’ section)
- la (load address) (see notes for ‘Addition’ section)
Store (Source Register rS to Memory)
Latency 1
Operand Size Basic Options Operands
byte stb [ u, x ] rS,d(rA)
half-word sth [ u, x ] rS,d(rA)
word stw [ u, x ] rS,d(rA)
Options:
u “Update,” if (rA != 0) rA = effective address (after store). Unlike loads, it is
permissible to set rA = rS.
x “Index,” use register indirect w/index addressing, i.e., stbx rS,rA,rB.
General Notes:
- The default addressing mode is register indirect w/offset, unless x option is
specified, i.e., stb rS,d(rA).
- Operand ordering is different from normal. The Source is given first.
Addition (negate, +, -)
Latency 1
Result Basic ÛOptions Operands
rD = -rA neg Û[ o, . ] rD,rA
rD = rA + SIMM addi < s, c, c. > none rD,rA,SIMM
rD = rA + rB add < c, e > [ o, . ] rD,rA,rB
rD = rA - 1 addme Û[ o, . ] rD,rA
rD = rA + 0 addze Û[ o, . ] rD,rA
rD = SIMM - rA subfic Ûnone rD,rA,SIMM
rD = rB - rA subf < c, e > [ o, . ] rD,rA,rB
rD = -1 - rA subfme Û[ o, . ] rD,rA
rD = 0 - rA subfze Û[ o, . ] rD,rA
Options:
i “Immediate operand,” specified as the 16-bit value SIMM.
s “Shifted,” the given 16-bit immediate is left-shifted 16-bits to become the
high half of a word. The low half is zero.
c “Carry,” update XER(CA) bit.