Parallel Addition
Volume Number: 10
Issue Number: 10
Column Tag: Programmer’s Challenge in-depth
RGB to YUV Using Parallel Addition 
Some unique approaches to optimization
By Robert Munafo, Malden, MA
Note: Source code files accompanying article are located on MacTech CD-ROM orsource code disks.
About The Author
Robert Munafo - Robert works for VideoGuide, a startup in the Boston area.
Prior to that, he developed drivers and embedded software for GCC Technologies’
printer products. Robert’s been writing free Mac software since 1984. One of the
first public-domain games for the Mac, Missile, continues to run on every new model
Apple releases! He also became well-known for his shareware effort Orion and the
free utility Icon Colorizer. He spends most of his spare time on the Mac gronking the
inner loops of the Mandelbrot Set and various compute-intensive simulations. He
awaits the day when massively parallel desktop computers will surpass the TFLOPS
(trillion floating-point operations per second) milestone. You can reach him via
e-mail at mrob@world.std.com.
Doing more in fewer cycles
This article contains the actual winning code for July’s Color Space Conversion
Programmer’s Challenge. Robert had sent in his code before the deadline but for some
reason the SANE calls he made during his RGBtoYUVInit routine caused both my Macs to
crash. I wasn’t able to identify the exact cause of the crash other than to witness that it
wasn’t his code. I suspect it had to do with Omega SANE’s backpatching (self-modifying
code) but I’m not sure. In any case, Robert was given a chance to submit a new version
of RGBtoYUVInit only (which was not part of the timings) that didn’t use SANE. He did
so and ended up being about 27% faster than the published winner Bob Noll. As you
will see, Robert’s explanation and use of parallel addition is excellent (and fast!). I
highly recommend studying it if you need to do fast matrix multiplication with constant
coefficients.
- Mike Scanlin
When I created my entry for the July 1994 Programmers’ Challenge, I used
some novel optimization techniques which are explained here. For background
material, see the contest statement in the July 1994 issue, page 44, and the results
presented in the September 1994 issue.
I will briefly restate the challenge. It involved converting a large number of
[R,G,B] values into [Y,U,V] (a color system used in JPEG and NTSC, among others)
using the formula:
Y 0.29900000 0.58700000 0.11400000 R
U = -0.16873590 -0.33126410 0.50000000 * G
V 0.50000000 -0.41868760 -0.08131241 B
Each entry consisted of an init routine that would not be timed and an RGBtoYUV
routine that would take arrays of [R,G,B] values and output arrays of [Y,U,V] values.
As always in the Programmers’ Challenge, accuracy is most important, followed by
speed, code size, and elegance.
Analysis of Rounding
The first problem to solve involved figuring out how various types of rounding
would affect computed results. The challenge required producing output that was
equivalent to the results that would be produced when infinite precision is used.
Fractions of N.5 (e.g. 2.5 or -2.5) would be rounded down, and anything else would be
rounded to nearest. The problem clearly required the use of limited-precision integer
math, so I had to figure out how much precision was necessary to produce acceptable
results.
I conducted a brute-force search and determined that out of all possible Y, U, and
V values produced by the transform matrix, the closest fraction to N.5 was N.499000
or N.501000. In other words, to distinguish an N.5 result from all other results, the
math must be precise enough to distinguish differences as small as 0.001, or 2-10.
Since 8 bits are used for the integer position of the answer, at least 18 bits are needed.
The simplest way to get the right answer is to use 19 bits (or more) to compute the
fraction, then add 0.4995, then chop off the fraction. This works because 0.4995
equals 0.5 - (0.001 / 2). Here are three examples:
2.501 + 0.4995 = 3.0005 -> 3
2.500 + 0.4995 = 2.9995 -> 2
2.499 + 0.4995 = 2.9985 -> 2