Arrays Iterators Comparators
Volume Number: 15
Issue Number: 2
Column Tag: PowerPlant Workshop
Arrays, Iterators, and Comparators... Oh my!
by John C. Daub, Austin, Texas USA
A look at PowerPlant's Array Classes
No matter what you do in life, sooner or later you have to deal with data - it's
unavoidable. This is certainly true in software development. We need ways to read and
write data, make it persistent, add and remove, examine, and otherwise manipulate
information. Without data, there's not much point nor ability to do what we're doing.
Since the manipulation of data is so vital to software, it would stand to reason that
providing reusable means to work with data would be welcomed by software
developers.
There are many different forms of data manipulators available. There are end-user
solutions like FileMaker, database engines such as NeoAccess and OOFILE, and simpler
solutions such as the C++ Standard Library containers and PowerPlant's Array
Classes. If you haven't guessed, that's what this article is about - PowerPlant's Array
Classes. These classes provide PowerPlant and you with a Mac-savvy means of working
with lists of data. If you look through the PowerPlant sources themselves, you'll see
the Array classes used in many key places; chances are good you'll find yourself using
them throughout your own code as well. The Array Classes certainly don't fit the needs
of every situation, but in order to locate the place for them in your "Programmer's
Toolchest" read on to see what they have to offer.
Array Overview
The main purpose of the Array Classes is to provide a means for storing and working
with lists of data. The Array Classes are comprised of three components: Arrays,
Iterators, and Comparators. Arrays are an ordered collection of items. Iterators are a
means of sequentially traversing Arrays. Comparators are a means for comparing
Array elements. Used together, these components provide you with a powerful and easy
to use data management system - again, it's what PowerPlant uses itself (I'll provide
some examples of this throughout the article). Figure 1 shows the classes and class
hierarchy for the Array Classes.
Figure 1. The PowerPlant Array Classes hierarchy.
One item to note is that a couple classes listed above, LSharableArray and
LFastArrayIterator, are not found in the Array Classes folder within the PowerPlant
folder of the Metrowerks CodeWarrior Professional installation. These classes came
from the Metrowerks Constructor sourcebase, and consequently are found within the
Constructor Additions folder in the PowerPlant folder (at least as of this writing,
using CodeWarrior Pro 4). Nevertheless, they are still available for your use and
worth discussing in this article.
As you might guess, Arrays are the core of the Array Classes, and LArray is the central
base class. LArray is an ordered sequence of fixed-sized items, not unlike a regular
C/C++ array. However unlike a regular C/C++ array whose size is static and
determined at compile time, an LArray is dynamic - you can add and remove items at
runtime. This is accomplished by storing the Array data within a Handle, which is
resized as data is added or removed using only as much memory as is required at the
time. An LArray is also unlike a regular array in that an LArray is one-based and not
zero-based; that is, the first item is at index one and not index zero. Other differences
between regular arrays and LArray is that an LArray can be iterated and sorted. We'll
discuss iteration and sorting later in the article.
One additional item of note is what can be stored within an LArray. The answer is:
almost anything. When you add an item to an LArray you do not store that item itself
but rather a copy of that item (deep in the bowels of LArray the internal storage
Handle is resized and BlockMoveData is called to actually add/copy the item into the
LArray). Due to this storage mechanism, you can store just about any type of data:
built in data types like char, int, long; other simple data types like Handle, Ptr,
OSType; simple data structures like Point, Rect, SPaneInfo; and pointers from
NewPtr, malloc, or new/new[]. What cannot be stored are more complex data such as
C++ objects. Objects cannot be stored for a few reasons.
First, recall that the internal storage of an LArray is a Handle, which is a relocatable
block of memory. C++ objects have some invisible structures attached to it to take
care of housekeeping details, such as vtables. These structures contain pointers to
specific areas in memory where your object is. If you store your object within a
Handle and the Memory Manager moves that Handle, your vtables do not move and
update with the object. The vtables are now invalid and chances are good that next time
you try to access members of that object you'll crash or have some other sort of
unpredictable behavior. Second, it's generally considered evil to perform bitwise
copies of objects (Cline 1995, FAQ 208), and that is exactly what BlockMoveData
does. This is why we have copy constructors and assignment operators, but
unfortunately there is no way to invoke those within LArray's copy mechanisms. But
even if they could be invoked, or if there was a case where the bitwise copy would be
ok (there certainly are instances of this), you still have problem one to contend with.
If you need to store objects in an LArray, you should instead store pointers to the
objects. LArray's can only store plain old datatypes (PODs).
Basic Functionality
If you haven't yet, and you have the ability to do so, open a copy of LArray.cp (I'm
using the version from CodeWarrior Pro 4). You'll notice the class is fairly sizable,
but it's certainly not unmanagable. Most of the functionality can broken down into a
few logical groups, and the ones you'll use most frequently will be the manipulators
and the inspectors.
The manipulators allow you to add, remove, and change Array items. To add an item to
an Array you can use AddItem() or InsertItemsAt(). AddItem() allows you to add one
item to the Array. InsertItemsAt() allows you to insert one or more items into the
Array at a specified index; all inserted items are set to the same value, and are
inserted contiguously. The difference between AddItem() and InsertItemsAt(), aside
from the ability to insert one item versus multiple items, is if the Array is unsorted,
AddItem() adds to the end of the Array (if the Array is sorted, AddItem() actually calls
through to InsertItemsAt(), and the item is added wherever the sort places it). The
bonus of AddItem() is a speed gain as it does not have to contend with index range
checking nor the possible overhead of maintaining a sorted Array. If your Array
doesn't bother with sorting and/or adding to the end is acceptable, you'll probably want
to use AddItem() over InsertItemsAt() for the speed gain.
Since you can add items, you also need to remove items. LArray provides the
RemoveItemsAt(), RemoveAllItemsAfter(), and Remove() methods for just this
purpose. RemoveItemsAt() removes a specified number of items starting at a given
index. RemoveAllItemsAfter() is a variation on RemoveItemsAt() that will remove all
items located after the specified index. Finally, given a particular item, Remove()
removes that item from the Array.
The remaining manipulators allow you to modify existing Array items.
AssignItemsAt() allows you to assign a (new) value to the item(s) you specify.
SwapItems() allows you to exchange the values of two items within an unsorted Array.
MoveItem() moves an item within an unsorted Array from one position to another. It
is important to note that MoveItem() works as if you first removed the item and then
inserted it - the index is relative to the Array, not the items within the Array. To
illustrate, suppose you have an Array containing five items and you wish to move the
item at index two to index four:
theArray->MoveItem(2, 4);
The action is simple to do, but the results might not be quite what you expect. Before
the move your Array might have looked like this:
Index: 1 2 3 4 5
Item: A B C D E
You might expect after the move for your Array to look like this:
Index: 1 2 3 4 5
Item: A C B D E
If MoveItem() was item-based, this would be true. It would have taken the second
item, "B", and placed it where the fourth item, "D", was, bumping up those after, "D
and "E". But notice a resulting effect: "B" is not at index four but rather index three!
This could certainly lead to unexpected behavior as the result doesn't quite match the
original intent. Instead, MoveItem() is index based so after the move your Array would
actually look like this:
Index: 1 2 3 4 5
Item: A C D B E
Now item two has been moved to become item four. It was as if item two was first
removed and the Array adjusted from that removal, the itermediate step leaving the
Array looking like this:
Index: 1 2 3 4
Item: A C D E
And then the item inserted at the new index (four). MoveItem() doesn't actually
remove then insert the item, but that's the essence of how the method works.
The other group of methods you'll commonly use will be the inspectors. The inspectors
enable you to obtain information about items in the Array or the Array itself.
ValidIndex() returns if the given index is valid for the Array or not. GetItemSize()
returns the size of the requested item. GetCount() returns the number of items in the
Array. FetchIndexOf() returns the index of the given item in the Array, if the item is
in the Array. And probably the most commonly used inspector is FetchItemAt(), which
returns to you a copy of the item at the given index.
LArray also contains many other methods. The majority of these methods are protected
methods used internally by LArray to do the voodoo that it do so well, but there are a
few other public methods such as: Sort(), to sort the Array; and GetComparator() and
SetComparator(), to access the Array's Comparator object; and of course the various
constructors and destructor for the class. Although you do not need to worry yourself
with these internal methods to begin using LArray, you should eventually familiarize
yourself with how all of the methods work so you can gain a complete and thorough
understanding of how LArray operates.
LArray Subclasses
Looking back at Figure 1, you see there are five subclasses of LArray provided by
Metrowerks in the PowerPlant distribution: TArray, LVariableArray, LRunArray,
TRunArray, and LSharableArray.
TArray is a simple template wrapper class that inherits from LArray. Looking in
TArray.h, you'll notice only a handful of LArray methods are overridden, and then only
those public methods that take items/data as an argument. This is done to provide you
with a degree of typesafety in using the Array classes. Notice the LArray methods that
take items/data as an argument pass the argument as a void*; this is a necessary evil
to allow LArray to store various types of data and be a general-purpose class.
However, the use of void* not typesafe and prone to errors. By using TArray you are
able to specify the type of data stored within the Array and can avoid errors that can
stem from using mismatched types. Furthermore, since the type is known you can pass
the item/data by reference instead of by address, which some would contend is a "more
C++" way to pass arguments. Moreover, since type information is known you do not
need to worry about passing the size of the data since TArray handles that for you. Look
again at the overrides in TArray and you'll notice they're actually overloads. There is
one side-effect to TArray caused by these overloads/overrides, but it's nothing to
worry about. Strictly speaking the TArray methods are hiding the inherited virtual
functions from LArray, and if your compiler can notify you of such situations (the
Metrowerks C/C++ compilers can do this) then you will receive some compiler
warnings to that effect. There is no need to worry about these warnings as the design of
TArray will work correctly. You can suppress the warnings by turning off that
compiler warning globally (e.g. turn it off in the compiler's preference panel) or on a
case by case basis, which is what PowerPlant itself does. Look at the top of any
PowerPlant header file for a class that uses a TArray, such as LView.h. You should see
a declaration similar to this:
#pragma warn_hidevirtual off
template class TArray;
#pragma warn_hidevirtual reset
The #pragma will temporarily turn off the compiler warning as you explicitly
instantiate the template class allowing your code to compile cleanly and quietly.
A feature that TArray has that LArray does not is operator[]. operator[] allows your
Array to function, in terms of notation, just like a regular C/C++ array. Don't forget
that PowerPlant Array's are one-based, so although the notation might make it look
like a C/C++ array, it's still a PowerPlant Array. The benefit of operator[] is a huge
speed gain over inspectors like FetchItemAt(), but in exchange for speed you are
returned a pointer to the data within the internal storage buffer (instead of a copy)
and there is no range checking performed on the given index. These trade-offs can be
viewed as positives and/or negatives for using operator[]; it just depends upon the
context for use. But regardless of how you view TArray::operator[], tread carefully to
avoid any problems.
LVariableArray is similar to LArray in almost every way except instead of storing
fixed-sized items, LVariableArray stores items of varying size (but all items must
still be of the same type). Probably one of the most popular uses for LVariableArray is