May 94 Cornfield
Volume Number: 10
Issue Number: 5
Column Tag: From The Corn Field
Thoughts From The Cornfield 
Provocative, perhaps inflammatory, but just say no to assembly
language on PowerPC
By Steve Kiene, MindVision Software, Lincoln, Nebraska
About the author
Steve, author of things like Stacker for Macintosh, cares about performance, code
size, performance, portability, and performance as much as anyone we know (well,
there’s always Mike Scanlin, too). Steve’s recently worked through a number of
issues about porting to the PowerPC for performance, and along the way surprised
himself with his conclusions. He’s curious about your reaction, so please let us know
if they surprise you, too. I can just see our assistant Al holding up a placard with this
in big letters: editorial@xplain.com
Writing code in assembly language instead of a high-level language to get
performance is fast becoming an historic anachronism. The fierce competition in the
90’s leads to time-to-market battles that cannot be won by the company that insists on
writing large chunks of their product in assembly language.
I’ve seen plenty of code whose authors have spent an inordinate amount of time
tweaking assembly language instructions to get the most speed out of the code when the
real problem was a slow algorithm. It’s the old problem of not seeing the forest for the
trees. Careful examination of the algorithm offers more potential for improved
performance than coding a bad algorithm in tightly-tuned assembly language. This
generally holds true even when the improved algorithm is coded in C.
I took some code a friend had written; he had spent weeks hand-tuning assembly
code. I re-examined the algorithms and found a better way to do it. I coded it up in C
and the new C code ran fifty times faster on the 68K than the assembly solution had
before. Now, it not only performs, it’s portable and more maintainable. After simply
recompiling the code for Power Macintosh, its speed doubled. To convert the assembly
code would have taken at least a couple of weeks for someone proficient not only in
writing PowerPC assembly code, but also good at scheduling the assembly instructions
to keep the chip as busy as possible.
Now, with all that said, there are reasons for writing PowerPC assembly
language. So, if you have to write part of your program in assembly, make sure it’s
the right part and be completely sure you cannot increase the speed by improving the
algorithm. There’s little sense in writing assembly language for code that only
amounted to 4% of the execution time, but it’s not that hard to find programs that do
just that. What was gained by writing the code in assembly language rather than a
high-level language?
The release of the Power Macintosh machines has sent many 68K assembly
language programmers scrambling to learn the new architecture and its assembly
language so that they can continue to performance-program the Macintosh. However,
as they are finding out, programming in PowerPC assembly language is much harder
than on the 68K Macintosh.
I’ve seen several examples of PowerPC assembly language code that at first
glance looks fast but after careful examination the code turns out to run slower than
expected. RISCprocessors require understanding the architecture of both the CPU and
the memory bus to get good performance, and it’s simply difficult to keep all of the
rules and constraints in your head while trying to be creative and write code.
Compilers, on the other hand, just don’t care how many rules they have to remember.
Reasons to avoid assembly language
(1) Assembly language code is not easily ported to different instruction set
architectures. There are tools which will port 68K assembly to PowerPC
assembly, but you run the risk that the architectures are so different a port
doesn’t get you the full potential of the new architecture.
(2) Code can be written in a shorter amount of time in a high level language
than it can be in assembly. People want to argue this, claiming that bit
manipulation routines are too hard to do in C, but it’s just not true. I suspect that
if they knew C as well as they knew assembler there would be little or no
argument.
(3) It is far easier to make mistakes in assembly than it is in a high level
language. High level languages offer abstraction and structure which makes many
common assembly language problems simply non-existent.
(4) Code written in assembly is harder to maintain both for the original
programmer as well as a different programmer. Because of the fine-grain
control you get with assembly language, it is not always easy to follow the flow of
the code.
(5) The development tools available for writing assembly language are not
advancing at the same rate as those for high level languages. In fact, there are
many situations where the tools are getting worse. Apple’s PowerPC Assembler
for MPW is not nearly as sophisticated as their 68K Assembler.
Reasons to use assembly language
(1) Highly time-critical code, such as software which interfaces with a piece
of hardware which has very specific timing dependencies. Not very common.
(2) Code where space is at a minimum, such as embedded controllers.
Generally not applicable to the Macintosh.
(3) Code that is proven to be an unacceptable bottleneck in a specific task.
(4) Places where parameters are passed in specific locations that are not
easily accessible to a high level language. [Between the PowerPCruntime
architecture, and the protocol conversion that Mixed Mode does for you, this
problem essentially goes away on the Power Macintosh - Ed stb]
In all of these instances, there is a need for assembly only in specific places in
the code. There is no need to code large parts in assembly.
How to speed up your code - the old way
The most common way to speed up existing code is to find the parts of the program
that are slow and rewrite them in assembly. In the past that may have been a good way
to gain more speed. Today, that model is not only outdated, it can backfire. I’ve seen
people rewrite their code in PowerPC assembly language only to see it run SLOWER. Do
not assume you know more about the processor architecture than the compiler. Unless
you understand the instruction scheduling of the processor entirely, you probably
can’t out-do a good compiler.
How to speed up your code - the new way
Determine which parts of your program are used the most. If a particular feature
takes several minutes to run but is only used once a month, maybe it’s not as
important as features which takes ten seconds but are used every five minutes. Watch
your customers’ usage patterns. Ask them which parts of the program are annoyingly
slow. Ask them why they think those parts are slow. Remember, slowness is
subjective. What is slow to a power user may seem perfectly fine to a novice user. Who
uses your product, the novice or the power user?
Once you have identified the areas of your software that seem slow, you may want
to back up the results with scientific data. Run performance analysis tools to see
exactly where in the code things are slow when you perform the tasks that users said
were slow. THINK C and CodeWarrior have performance monitoring tools included that
work well. MPW has its own performance tools which are adequate. If you are writing
code that is not easily interfaced to these tools, I recommend you look at the source code
provided for the performance tools in THINK C. It is very easy to adapt this code to
monitor the performance of any piece of code.
One thing to remember is that the performance of your software may differ
greatly when comparing Power Macintosh to the 68K Macintosh. Performance may also
vary quite a bit between specific Macintosh models. Machines with a 32 bit data bus
will perform memory intensive operations much faster than machines with a 16 bit
data bus.
Now that you have figured out which parts of your program are slow, it is time to
decide how to make them faster. The first thing to do is to examine the underlying
algorithms of the code. Is there anything fundamental that you can do to speed things
up? For example, if you are performing a text search, how do you search through the
text? Do you use Munger? Perhaps something like a Boyer-Moore algorithm would be
much faster. Remember, the key is to work smarter. Brute force is not the answer -
it’s a matter of brains over brawn.
Sometimes simply a small change to your existing algorithm will make things
much faster. I sped up a search algorithm I wrote years ago by a factor of three by
simply adding two lines of code. Look at your algorithm and examine how it operates
with common data that goes through it. Perhaps certain shortcuts can be taken when
the most common data runs through it.
If you don’t have many books on fundamental computer algorithms, now is the
time to stock up. I am a firm believer that you cannot have enough books on algorithms.
At the end of this article I have listed several books that will help broaden and round
out your algorithm skills. I highly recommend all of them.
Once you have analyzed the specific parts of your program that are bottlenecks, it
is time to look at the architecture of your program as a whole. If your program is
rather large you may want to look at it as several modules working together.
Is the underlying architecture of your program going to be a bottleneck? Are
there time consuming tasks that can be done in the background at idle time rather than
being done while the user must wait? Are you doing network communication
synchronously when you could do it asynchronously and give the user their machine
back? Are there tasks that need to be performed but don’t need to give immediate
feedback to the user? These kinds of tasks are good candidates for idle time processing,
additional user feedback, modeless dialog boxes, asynchronous programming, and other
methods of helping the user feel as if they are not waiting on your program, or at least
aren’t pr evented from doing something else while you get your thing done. If you keep
the user occupied or help them feel productive while your program is working, they’ll
be more patient with whatever performance you have.
How to write the code in Assembly Language
If, after careful examination, you have determined a bottleneck in your program,
analyzed the algorithms as best you can, rewritten them to be as efficient as possible,
and still it is not fast enough, perhaps it is time to code a small part in assembly. The
best place to start is to disassemble compiler- generated code for the routine you want
to code in assembly. Look at the code. What is inefficient about it? Are registers
constantly being reloaded? Are the registers being used efficiently? Are the
instructions scheduled for maximum pipelining? Very often you can take the
disassembled code, make a few minor modifications to it and see a very nice speed
increase.
Perform accurate timing tests on the code you are optimizing. Unless you
completely understand the PowerPC Architecture Manual and the PowerPC 601 User’s
Guide, more often than not you will make PowerPC code slower than a good compiler.
The bottom line is that it must run faster, not look faster.
Maintain an exact high-level equivalent of the assembly code, and keep it right
there in the same file. This way if you port your code to a different architecture,
you’ve got what you need to get up and running quickly. In many cases the bottleneck on
one machine will not be a bottleneck on another.
In Conclusion
This article has discussed some alternate methods of speeding up your program
execution that are in many ways better than traditional methods used by many
programmers. The goal is to maximize your gain and minimize your effort. By working
smarter rather than harder, you can have a faster program in less time.
Recommended Books
[1] Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduction to
Algorithms. MIT Press, 1990.
[2] Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman. The Design and Analysis of
Computer Algorithms. Addison-Wesley, 1974.
[3] Saumyendra Sengupta and Paul Edwards. Data Structures in ANSI C. Academic
Press, 1991.
[4] Donald Knuth. The Art of Computer Programming, Volumes 1-3.
Addison-Wesley, 1973
[5] Daniel H. Greene and Donald E. Knuth. Mathematics for the Analysis of
Algorithms., Third Edition Birkhäuser, 1990.
[6] P. D. Eastman. Go, Dog, Go! Random House, 1961.
[7] Manoochehr Azmoodeh. Abstract Data Types and Algorithms, Second Edition.
Macmillan, 1990.