All Databases develop - 1996

December 96 - Chiropractic for Your Misaligned Data

Chiropractic for Your Misaligned Data

Kevin Looney and Craig Anderson

Misalignment occurs when a program accesses data in a way that's

not in sync with the processor's internal paths. This can slow down

performance a little or a lot, depending on the CPU architecture.

But finding these areas in code can be very difficult. We'll

demonstrate the cause and cost of alignment problems and then

show you a couple of tools you can use to detect them in your

programs.

Sometimes Macintosh application performance is limited by architectural factors that

can't be remedied, like the raw speed of the I/O or memory subsystem. But the

programmer does have control over some factors that affect the speed of the memory

subsystem and thus application performance -- such as how data is aligned and

accessed within memory. By default, most compilers will do the appropriate alignment

for PowerPC code. However, alignment options offered for backward compatibility

with the 680x0 architecture can cause significant overhead.

Misalignment is a difficult performance problem to detect. Traditional debugging and

performance tools typically don't help you find misaligned accesses. On top of this,

misalignment problems manifest themselves differently on different CPU

architectures.

In this article, we'll define misalignment, describe how it's caused, discuss the

overhead penalties for accessing misaligned data on various microprocessors, and

introduce some tools designed to aid in the detection of misaligned accesses in code.

These tools accompany the article on this issue's CD and develop's Web site. Armed

with these tools and what you learn in this article, you can perform chiropractic

adjustments on your programs to solve their data alignment problems.

WHAT IS MISALIGNMENT AND WHY SHOULD I CARE?

A piece of data is properly aligned when it resides at a memory address that a

processor can access efficiently. If it doesn't reside at such an address, it's said to be

misaligned. In the PowerPC architecture, 32-bit and 64-bit floating-point numbers

are misaligned when they reside at addresses not divisible by 4. Misalignment

exceptions are taken based on the specific microprocessor.

Whether a data item is aligned depends not only on its address and the processor that's

performing the access, but also on the size of the item. In general, data of size s is

aligned if the least significant n bits of its address are 0, where n = log2(s). Hence,

1-byte items are always aligned, while 2-byte items are aligned on even addresses and

4-byte items are aligned if the address is evenly divisible by 4. This alignment policy

is often called natural alignment and is the recommended data alignment for code to run

well on all current and future PowerPC processors.

Accessing misaligned data can be quite costly, depending on the microprocessor your

program is running on. We'll demonstrate just how costly in a minute, but in general,

misaligned memory accesses take from 2 to 80 times longer than aligned accesses on

603, 604, and future PowerPC microprocessors. A misaligned access can require

more time to perform for two reasons:

• It may require two requests to the memory system instead of just one.

• It may cause the processor to take an unaligned access exception, a costly

penalty.

WHAT CAUSES MISALIGNMENT?

Misaligned accesses can involve variables located on the stack or on the heap. The type

of compiler and the compiler settings that you use will determine whether misaligned

accesses occur. Improper structure placement and incorrect pointer arithmetic can

also cause misaligned accesses. These problems can be found with the tools and

techniques described below, but this article generally focuses on alignment problems

that aren't caused by programmatic errors.

Most compilers for the Macintosh allow you to choose among various alignment

options. Some compilers default to 2-byte alignment so that data alignment in

PowerPC code mimics alignment on the 680x0 processor. While using this option

means that structures written to disk in binary format can be accessed easily by both

architectures, it also permits alignment problems in the PowerPC architecture. Both

improper structure padding and misaligned stack parameters can result in misaligned

accesses.

IMPROPER STRUCTURE PADDING

When a float field occurs in a structure, improper padding by the compiler will cause

the float field to be misaligned. The example in Listing 1 uses MPW's alignment

pragmas to illustrate this.

______________________________

Listing 1. An example of a poorly aligned structure

#pragma options align=mac68k

typedef struct sPoorlyAlignedStruct {

char fCharField;

float fFloatField;

char fSecondCharField;

} sPoorlyAlignedStruct;

sPoorlyAlignedStruct gPoorlyAlignedStruct;

#pragma options align=reset

______________________________

In this example, a compiler that did no padding would align fFloatField on an offset of

one byte from the structure's base address. Since compilers (and memory allocators)

usually align the base of a structure on a boundary of at least four bytes (and

multiples of four bytes), every access to fFloatField will cause a misalignment error.

Also, fFloatField will be misaligned in statically or dynamically allocated arrays, since

the lengths of structures are padded so that each structure that's an array element

starts on a 4-byte aligned address.

A compiler with a 2-byte padding setting would align fFloatField on an even address,

but this would still cause misalignment when that address isn't divisible by 4.

Compilers using the mac68k pragma (as shown in Listing 1) cause 2-byte alignment,

putting fFloatField on an even but often misaligned address for PowerPC processors.

A compiler with a 4-byte padding setting would always align the field properly.

MISALIGNED STACK PARAMETERS

Besides affecting the alignment of data in a structure, compiler settings can affect the

way data structures are placed on the stack. Consider this function declaration:

void FunctionFoo (sPoorlyAlignedStruct firstParam, float floatParam)

In this example, the parameters are placed on the stack (even though PowerPC

compilers use registers if possible). A compiler using a 680x0 2-byte padding option

may align firstParam.fFloatField on an even address, but if the address isn't divisible

by 4 this will cause a misalignment every time that parameter field is accessed within

FunctionFoo. It won't, however, change the alignment of other parameters on the stack.

On the PowerPC processor, nonstructure parameters are usually placed in registers.

There are no alignment problems when accessing registers.

THE COST OF MISALIGNMENTS

To demonstrate the cost of misalignments, we've written the code in Listing 2, which

generates both aligned and misaligned accesses in the course of a million iterations. It

accesses a byte array in different ways -- data writes of integers, floats, and doubles

-- and at different offsets. In a portion of the code not shown, accesses are confined to

within a single page of memory, and interrupts are turned off. Running this code

enabled us to calculate the difference in performance between aligned and misaligned

accesses. This code (with slight modifications for the various compilers) was compiled

with the Symantec, MrC, and Metrowerks compilers. All compilers behaved similarly.

______________________________

Listing 2. Generating accesses for comparison of access time

#define kNumAccessesPerCycle 200

#define kNumCycles 5000

// Number of total accesses = kNumAccessesPerCycle * kNumCycles

// 1000000 = 200 * 5000

#define kTableSize 1608 // Table size needed for 200 separate aligned

// accesses on the largest data type (doubles)

typedef enum ECType { eLong, eFloat, eDouble };

void main(void)

{

double AlignedTimeFloat = AlignLoop(0, eFloat);

double Misaligned1TimeFloat = AlignLoop(1, eFloat);

double Overhead1Float =

(((Misaligned1TimeFloat - AlignedTimeFloat) * 100)

/ AlignedTimeFloat);

double avgOverhead1Float =

(Misaligned1TimeFloat - AlignedTimeFloat)

/ kNumTotalAccesses;

...

}

// The function AlignLoop measures the time of a loop of "writes" to

// a byte array. The writes are either aligned or misaligned, based

// on the offset parameter, which should be between 0 and 7. The

// type should be eLong, eFloat, or eDouble.

double AlignLoop(short offset, ECType type)

{

UnsignedWide startTime, stopTime;

double start, stop;

char bytetable[kTableSize];

long j, k;

// Get starting timestamp.

Microseconds(&startTime);

   switch (type) {
      case eLong:
         {

long *longPtr = (long *) &bytetable[offset];

for (j = 0; j < kNumCycles; j++)

for (k = 0; k < kNumAccessesPerCycle; k++)

longPtr[k] = 1;

}

break;

case eFloat:

{

float *floatPtr = (float *) &bytetable[offset];

for (j = 0; j < kNumCycles; j++)

for (k = 0; k < kNumAccessesPerCycle; k++)

floatPtr[k] = 1.0;

}

break;

case eDouble:

{

double *doublePtr = (double *) &bytetable[offset];

for (j = 0; j < kNumCycles; j++)

for (k = 0; k < kNumAccessesPerCycle; k++)

doublePtr[k] = 1.0;

}

break;

}

// Get ending timestamp.

Microseconds(&stopTime);

// Move the values to doubles.

start = (((double) ULONG_MAX + 1) * startTime.hi) + startTime.lo;

stop = (((double) ULONG_MAX + 1) * stopTime.hi) + stopTime.lo;

return stop - start;

}

______________________________

Table 1 shows the results. Overhead is calculated as the percentage difference between

the time required for aligned and misaligned accesses. Our experiments showed that

misaligned accesses at different offsets seemed to pay the same penalty (excluding

cases where the two accesses required to retrieve the data straddle a memory page

boundary, which is every 4K of memory).

Table 1. Misalignment overhead for basic data types, native PowerPC code. Access

times are in (usec).

CPU and data Aligned total Misaligned total

Overhead

access time access time

PowerPC 601 integers 113439 119573 5.4%

PowerPC 601 floats 63234 Ê94505

50.0%

PowerPC 601 doubles 63251 Ê113306

79.1%

PowerPC 604 integers 687 Ê695 1.1%

PowerPC 604 floats 261 Ê23753

9009.0%

PowerPC 604 doubles 262 Ê22546

8509.5%

Referenced by (4):