All Databases MacTech Vol 10-1994

Parallel Addition

Volume Number: 10

Issue Number: 10

Column Tag: Programmer’s Challenge in-depth

RGB to YUV Using Parallel Addition

Some unique approaches to optimization

By Robert Munafo, Malden, MA

Note: Source code files accompanying article are located on MacTech CD-ROM or

source code disks.

About The Author

Robert Munafo - Robert works for VideoGuide, a startup in the Boston area.

Prior to that, he developed drivers and embedded software for GCC Technologies’

printer products. Robert’s been writing free Mac software since 1984. One of the

first public-domain games for the Mac, Missile, continues to run on every new model

Apple releases! He also became well-known for his shareware effort Orion and the

free utility Icon Colorizer. He spends most of his spare time on the Mac gronking the

inner loops of the Mandelbrot Set and various compute-intensive simulations. He

awaits the day when massively parallel desktop computers will surpass the TFLOPS

(trillion floating-point operations per second) milestone. You can reach him via

e-mail at mrob@world.std.com.

Doing more in fewer cycles

This article contains the actual winning code for July’s Color Space Conversion

Programmer’s Challenge. Robert had sent in his code before the deadline but for some

reason the SANE calls he made during his RGBtoYUVInit routine caused both my Macs to

crash. I wasn’t able to identify the exact cause of the crash other than to witness that it

wasn’t his code. I suspect it had to do with Omega SANE’s backpatching (self-modifying

code) but I’m not sure. In any case, Robert was given a chance to submit a new version

of RGBtoYUVInit only (which was not part of the timings) that didn’t use SANE. He did

so and ended up being about 27% faster than the published winner Bob Noll. As you

will see, Robert’s explanation and use of parallel addition is excellent (and fast!). I

highly recommend studying it if you need to do fast matrix multiplication with constant

coefficients.

- Mike Scanlin

When I created my entry for the July 1994 Programmers’ Challenge, I used

some novel optimization techniques which are explained here. For background

material, see the contest statement in the July 1994 issue, page 44, and the results

presented in the September 1994 issue.

I will briefly restate the challenge. It involved converting a large number of

[R,G,B] values into [Y,U,V] (a color system used in JPEG and NTSC, among others)

using the formula:

Y 0.29900000 0.58700000 0.11400000 R

U = -0.16873590 -0.33126410 0.50000000 * G

V 0.50000000 -0.41868760 -0.08131241 B

Each entry consisted of an init routine that would not be timed and an RGBtoYUV

routine that would take arrays of [R,G,B] values and output arrays of [Y,U,V] values.

As always in the Programmers’ Challenge, accuracy is most important, followed by

speed, code size, and elegance.

Analysis of Rounding

The first problem to solve involved figuring out how various types of rounding

would affect computed results. The challenge required producing output that was

equivalent to the results that would be produced when infinite precision is used.

Fractions of N.5 (e.g. 2.5 or -2.5) would be rounded down, and anything else would be

rounded to nearest. The problem clearly required the use of limited-precision integer

math, so I had to figure out how much precision was necessary to produce acceptable

results.

I conducted a brute-force search and determined that out of all possible Y, U, and

V values produced by the transform matrix, the closest fraction to N.5 was N.499000

or N.501000. In other words, to distinguish an N.5 result from all other results, the

math must be precise enough to distinguish differences as small as 0.001, or 2-10.

Since 8 bits are used for the integer position of the answer, at least 18 bits are needed.

The simplest way to get the right answer is to use 19 bits (or more) to compute the

fraction, then add 0.4995, then chop off the fraction. This works because 0.4995

equals 0.5 - (0.001 / 2). Here are three examples:

2.501 + 0.4995 = 3.0005 -> 3

2.500 + 0.4995 = 2.9995 -> 2

2.499 + 0.4995 = 2.9985 -> 2

Referenced by (2):