Code Optimization
Volume Number: 15
Issue Number: 4
Column Tag: CodeWarrior Workshop
Code Optimization
by Andrew Downs
Code Optimization Options in CodeWarrior
Introduction
Code optimization involves the application of rules and algorithms to program code
with the goal of making it faster, smaller, more efficient, and so on. Often these types
of optimizations conflict with each other: for instance, faster code usually ends up
larger, not smaller. Optimizations can be performed at several levels (e.g. source
code, intermediate representations), and by various parties, such as the developer or
the compiler/optimizer.
This article discusses and demonstrates the code optimization options available in
CodeWarrior Pro. In spite of its ever-growing sophistication, CodeWarrior still
makes it easy to select the right level of optimization for your program. Understanding
how CodeWarrior applies optimizations will help when it is time to debug your code,
especially when using a low-level debugger or when the source code is not available.
Optimization Terms
In order to understand the process, you need to speak the language. Here are definitions
for the terms discussed in this article, plus a few extras. Keep in mind that the words
replace, substitute, etc. do not imply physical changes to your original source code,
but rather are reflected in the generated machine code, after your source code has been
parsed and optimized.
Each term includes a brief description, including why it is important. Code examples
illustrating some of these terms will be presented later in the article.
Common subexpression elimination
If the value resulting from the calculation of a subexpression is used multiple
times, perform the calculation once and substitute the result for each
individual calculation.
Constant propagation
Replace variables that rely on an unchanging value with the value itself.
Copy propagation
Replace multiple variables that use the same calculated value with one
variable.
Dead code elimination
Code that never gets executed can be removed from the object file to reduce
stored size and runtime footprint.
Dead store elimination
Variables and related store operations that are never used can be removed
from the object file to reduce stored size and runtime footprint.
Global register allocation
Variables that do not overlap in scope may be placed in registers, rather than
remaining in RAM. Accessing values stored in registers is faster than
accessing values in RAM.
Global vs. peephole scope
Global optimization allows the compiler/optimizer to look at the overall
program and determine how best to apply the desired optimization level.
Peephole provides local optimizations, which do not account for patterns or
conditions in the program as a whole. Local optimizations may include
instruction substitutions.
Inline calls
A function that is fairly small can have its machine instructions substituted at
the point of each call to the function, instead of generating an actual call. This
trades space (the size of the function code) for speed (no function call
overhead).
Instruction scheduling
Instructions for a specific processor may be generated, resulting in more
efficient code for that processor but possible compatibility or efficiency
problems on other processors. This optimization may be better applied to
embedded systems, where the CPU type is known at build time. Most desktop
software should use the Generic PowerPC option in CodeWarrior.
Lifetime analysis
A register can be reused for multiple variables, as long as those variables do
not overlap in scope.
Loop counting using CTR
The count register (CTR) on the PowerPC is intended to be used for counting,
such as in loop iterations. There are some branch instructions that
specifically use the value in the CTR.
Loop invariant expressions (code motion)
Values that do not change during execution of a loop can be moved out of the
loop, speeding up loop execution.
Loop transformations
Some loop constructs can be rewritten to execute more quickly. For example, a
for{} loop can be rewritten as a while{} loop, saving one or more instructions.
Loop unrolling
Statements within a loop that rely on sequential indices or accesses can be
repeated more than once in the body of the loop. This results in checking the
loop conditional less often.
Strength reduction
Certain operations and their corresponding machine code instructions require
more time to execute than simpler, possibly less efficient counterparts.
CodeWarrior Optimization Dialog
Figure 1 shows the dialog box pane used to specify instruction scheduling and
peephole optimization in CodeWarrior.
Figure 1. PPC processor pane.
Global optimizations are specified in the optimization pane, shown in Figure 2.
Global optimization can be performed with the intent of making the resulting code
faster or smaller, but not both.
Figure 2. Global optimization pane.
The code examples presented in this article use either peephole optimization or the
faster execution speed settings. The smaller code size setting is briefly discussed later
in the article. Profiled results for smaller code size are provided in the article
archive, but are not illustrated in code examples.
Disassembling Code
The Disassemble menu item under the Project menu in CodeWarrior allows you to
view the compiler's output resulting from application of any optimization choices.
This is extremely useful in that it gives you a chance to save an assembly listing of
your code for use during debugging sessions. That menu item was used to produce the
listings presented in this article.
Just Enough Assembly Language
Listed below are the PowerPC instructions used in this article. Following each is the
expanded mnemonic, and a short description of what the instruction does (using the
specified operands). The instructions are presented in mnemonic alphabetical order,
not the order in which they are encountered in the code. For a thorough treatment of
Power PC assembly language, refer to the references listed at the end of this article.
add r30,r30,r0
Add: store the sum of the values stored in r0 and r30 in r30.
addi r30,r30,1
Add Immediate: add 1 to the value stored in r30, and store the result in r30.
b *+12
Branch: continue execution at the location 12 bytes (3 instructions) after this
instruction.
bdnz *-80
Branch if Decremented Count Register is Not Zero: decrement the Count
Register, and if its value is not zero, continue execution 80 bytes (20
instructions) before this instruction.
bl.whileTest__Fv
Branch and Link: branch to the location specified and continue execution,
saving the address of the next instruction to be executed in this routine in the
Link Register.
blt *-12
Branch if Less Than: if the result of the previous comparison resulted in the
Condition Register LT bit being set, continue execution at the instruction 12
bytes (3 instructions) prior to this instruction.
bne *-28
Branch if Not Equal: if the result of the previous compare operation is not
zero, as recorded in the Condition Register EQ bit, continue execution at the
instruction 28 bytes (7 instructions) before this instruction.
cmplw r31,r29
Compare Logical Word: perform an unsigned comparison between r29 and
r31, and update the Condition Register as appropriate.
cmplwir31,$0000
Compare Logical Word Immediate: compare the value in r31 against 0, and set
the appropriate flags in the Condition Register.
li r30,0
Load Immediate: move the value 0 into register r30.
lwzr3,gLimit(RTOC)
Load Word and Zero: move the value of the global variable gLimit into register
r3. Global variables are referenced as offsets from the Table of Contents
register (RTOC).
mr r31,r29
Move Register: copy the contents of r29 into r31.
mtctr r0
Move To Count Register: copy the value in r0 to the Count Register (typically
used for loop iteration).
mullw r0,r29,r28
Multiply integer word, return Low Word: store the product of the values
stored in r28 and r29 in r0.
stmw r26,-24(SP)
Store Multiple Word: store registers r26-r31 beginning at the location -24
bytes off the current stack pointer location.
stw r31,-4(SP)