Getting To Know c-tree Plus
Volume Number: 14
Issue Number: 11
Column Tag: Tools Of The Trade
Getting to Know c-tree Plus
by Andy Dent BSC MACS AACM
FairCom Server - the invisible database engine
Let's say you are such a fan of MacTech Magazine that you can't wait for normal
delivery and have it sent Federal Express. You just used a FairCom Server (Federal
Express incorporate the FairCom Server into their routing boxes). If you phoned in
your order through an Alcatel exchange, you probably dealt with another FairCom
Server and if (horrors) you are sitting in front of an Intel-powered computer then
right now you are also benefiting from a FairCom Server (used to run Intel's
production line). Although the FairCom Server is a staple of many Fortune 500
companies, the non-programming public rarely hears of it (or at least not as much as
Oracle, Sybase or even MS Access). From the company's start, nearly 20 years ago,
FairCom has gone about their business quietly providing embedded file management
tools to C programmers.
In this article I want to introduce you to c-tree Plus, the C language programming tool
for working with FairCom Server. I'll talk a little about the strengths and limitations
of that tool, so you can decide if it's right for your next project. After that, I want to
describe my own experiences in developing a C++ wrapper layer for c-tree Plus
called OOFILE and explain how OOFILE can work with c-tree to offer a more complete
programming solution.
The Basics of c-tree Plus
When you buy a c-tree Plus developer kit, you receive a cross-platform source code
product (Macintosh, Unix, OS/2 and that other operating system). The package allows
single and multi-user (file-sharing) deployment - all royalty-free. You also receive
a developer license to run the FairCom Server; but if you want to deploy the server
(rather than use file-sharing) you need to purchase servers for each site you deploy
to. FairCom's servers are reasonably priced by comparison with most and run on a
wide range of platforms from DOS (honestly, even DOS) to the most powerful Unix
workstations, and of course - the Macintosh. FairCom's web lists more platforms and
includes a comment that c-tree Plus runs on over one hundred operating
system/hardware combinations.
Models of Data Management
In the last couple of years the range of c-tree Plus deployment models has expanded to
include linkable servers for combining your application code with the FairCom
Server, multi-threaded variants and the LOCLIB model which allows simultaneous use
of servers and local single-user files. The server also now has two Java interfaces,
one of which is RMI.
It is unusual for a client-server product to include a shared-file multi-user model.
The shared-file mode of deployment is certainly attractive for producing shrink-wrap
applications (both because it allows royalty-free distribution and also because it
doesn't require your customers to run a server) and you should consider it. However,
there are significant trade-offs that you need to be aware of. First, on a security basis,
if your files are visible on a file server (to allow sharing) then they are accessible
through other means. The recent version 6.8 of c-tree Plus adds encryption - but
there is still the danger that someone could access the files using the ODBC driver.
However, if you use the server, password protection on the server controls ODBC
connections.
There is also the general performance issue of client-server vs. shared file models.
The client-server model is more efficient. With complex data manipulation, using
shared files generates many network operations. Even index searches still involve
several data retrievals depending on the size and depth of the index. Batch functions in
c-tree Plus mean operations such as deletion or searching for and retrieving many
record pointers are handled by a single server call. A server will also cache records
centrally whereas the calls to the library for the shared file must immediately write
all data back to the disk, and cannot cache values because of possible changes by other
users.
Finally, the single-user and client-server models of operation provide transaction
logging and data recovery. With shared files there is no central point controlling
access to the data and so transactions are not supported. This makes it possible for a
series of operations to be halted partway, causing database inconsistency if not
outright corruption.
Understand, these issues are not c-tree specific but a general flaw with shared files,
and well-known amongst users of other databases such as Microsoft's Jet Engine (that
comes with Access and Visual BASIC). These are issues you'll need to consider when it's
time for you to choose which data management model is right for your next project.
The Pleasure of being Well-Connected
FairCom supports the Macintosh True Believer. The Macintosh servers allow you to
communicate using both TCP/IP and AppleTalk protocols simultaneously. The same
applies to the client libraries, so a Macintosh application could be interacting with
both a local server on your AppleTalk network and out onto the Internet to a server
running on Unix or Windows NT. Unlike some other servers, an application can open
connections to many FairCom servers simultaneously. Each connection has its own
security and file contexts.
Other server models support appropriate ranges of communication protocols, such as
NetBIOS on Windows networks. One of the most interesting is the shared memory
server for Unix. This provides ultra-high speed communications when the only clients
are processes running on the same machine as the server.
Which End is Up?
This is a critical question for an Australian writing to a predominantly US audience!
More seriously, if you've ever tried to store data in files moved between platforms,
you will have run into the problem that Macintosh processors are Big-Endian like
SPARC RISC chips whilst the Intel world is Little-Endian. FairCom refer to these as
HIGH_LOW and LOW_HIGH formats respectively.
Ignoring this problem is commonly known as playing Cowboys and Endians with your
integers and is usually fatal. Some (all?) PowerPC models can have their endian
orientation set but in the Macintosh world of course are used in the same orientation as
the 68K family.
The c-tree Plus libraries have two ways to solve the endian problem, depending on
which model of database you are using. If you're using the FairCom Server, the data is
stored in the native format of the server machine. If you're using the single-user or
shared-file libraries you must build your libraries with the UNIFRMAT flag which
stores data in the Intel format. In either case, the data delivered to your application
code matches the platform on which you are running.
Mixing models and platforms has one problem with file compatibility. If a database is
used with the FairCom Server on a Mac or SPARC server it will be in the native
format, so can't be copied to a PC. A workaround is to use the LOCLIB model to have a
PC application copy data from the server and write it out to a single-user local
database.
In contrast to server file format issues, using the UNIFRMAT model allows you to copy
documents around regardless of platform. This suits the document model of many
desktop applications.
Although c-tree Plus is described primarily as a record-oriented engine and
indifferent to the contents of your records, to manage endian conversion you need to
supply additional information to the database engine. This brings us to the Data Object
Definition Array.
Doobie, doobie DODA
There are a number of reasons why you might want to supply a record schema to
c-tree Plus, so it knows the fields within your record and not just their total length.
1. As described above, you need the library to know where binary fields such
as integers are stored, so it can swap bytes depending on the platform.
2. FairCom offers a character-oriented report-writer product, r-tree,
which allows you to specify field names in calculation scripts and design a
report by field name
3. The ODBC driver for general access to the database requires a schema.
(Sadly, there is only a Windows version of the driver available.)
4. The functions for specifying indexes when creating the database allow you
to specify fields using the DODA - a less error-prone method than specifying
in terms of record offsets.
The DODA provides a list of field definitions that include the data type: a predefined set
of typical integer, string and floating point types. It is saved in a c-tree Resource (not
to be confused with the Macintosh variety) and so provides an embedded schema in the
data file that can be read by the products mentioned above, or your own code.
What is Fixed and what is Not
Variable length text is a pain in databases. One of the file modes you can choose is to
define some or all of your record as being variable length. This saves on storage. You
can still have indexes defined on variable length text fields, so if you need to search on
such fields but expect their size to vary widely then variable length records is a good
way to go. The DODA allows you to define fields with either 2 or 4 byte leading lengths
as well as strings with delimiters.
We have not used indexed variable length fields in OOFILE for our local consulting but
FairCom's implementation has a good reputation judging by the positive comments on
CompuServe.
A little, or a whole lotta Locking
One of the biggest problems with cross-platform database engines is locking.
Relational databases typically use page-locking which locks chunks of the file and can
inadvertently lock many records which should be available to other users. The
record-locking in c-tree Plus avoids this problem, and also provides for read locking
which allows many readers of a record, whilst banning any writes to change the
record. DOS and Windows don't support read locks natively in the operating system but
FairCom have a workaround. Their approach can be used to share a file on an NT server
between Mac and Windows users and have mutual locking.
The simplest use of locking in c-tree Plus is to leave locking solely to the library. You
can get away with this if write collisions are not expected - automatic locking on the
index files guarantees there won't be any corruption just because two people write
different records at the same time. More stringent locking is easily enabled with the
LockISAM call and you can choose between having locks acquired automatically for each
record accessed, or explicitly locking as you go.
Togetherness is not always a wonderful thing
If you programmed with 4th Dimension version 1, dBase or FoxPro, you will be used
to a database consisting of multiple files. This is inconvenient for applications,
although usually much more efficient. One significant benefit is that using a single file
for each database table allows optimal use of fixed-length records. This is one reason
why FoxPro and other xBase engines are so fast and why big mainframe databases used
to have fixed-length fields in ISAM.
Structuring your database as some or all individual files, or grouping data into
superfiles is very simple in c-tree Plus. The default behaviour is a single .dat and .idx
physical for each data file. The .idx file will therefore contain as many index trees as
there are indexes declared on the c-tree data file. However, you can choose to have
some index trees allocated to a different physical index file, if you wanted to separate a
particularly dynamic index from the others. A large and active index could even be
stored on its own disk, if you wanted to take advantage of disk-level caching in your
operating system.
A simple directory-like prefix is used to group multiple files into a superfile. Just
prefix the data file name with the superfile name, separated with a | character. For
example, if I wanted to open a superfile called 'school.db' which contained 'teachers'
and 'students' I would:
1. open the superfile 'school.db'
2. open the data file 'school.db|teachers'
3. open the data file 'school.db|students'
An implication of the simplicity of this scheme is that you can have multiple
superfiles open at once, which neatly satisfies the typical desktop application's
requirement to open multiple document. Even though the data files may have the same
names, the prefixing keeps them unique. In desktop terms, think of using different
folders to separate out identically named files.
Just because you create say twenty data files in a superfile, you are not forced to open
them every time you use that superfile. The directory idiom extends to just allowing
you to access a single data file, regardless of the original number or order creation.
The most significant downside to using superfiles is that they interweave all your data.
This means that database rebuilds are complicated, all records are inherently variable
length and so some performance advantages are lost. There is also an 18 byte header
for each record, which could be significant in very large databases with small records
(our largest user stores about 9 million records a month with separate files under
Unix - as our ISP it's in my interest to keep them happy).
The flexibility of the c-tree Plus model means however that you can keep temporary
files out of the superfile that is your main database. Many is the time I wanted to do
that with 4D, cringing at the thought of several hundred thousand temporary records
being created and deleted and fragmenting my single database.
Other Goodies
There are many features of c-tree Plus we don't have room to cover in detail. In the
spirit of completeness, some of the more powerful are:
• key compression for trailing bytes or leading common strings
• indexing only parts of fields, and easy building of compound keys
• indexes with custom collation (FairCom have a large European user base)
or reverse sort order
• mirroring data to another file
• skipping null values in the index
• conditional indexes, with a formula defined to filter membership of the
index
• transaction histories, making use of the transaction logfiles for auditing
all operations on a record and tracing an item through the files
• a portable threading model that can be used to provide threading across all
the environments supported by c-tree Plus.
That completes our brief tour of c-tree, and should give an idea of some of what the
FairCom solution provides. Next we'll take a look at OOFILE and see how c-tree and
OOFILE can work in concert to offer a unique and powerful solution for database
programmers.
It's easy to feel old when you mention ISAM and the twenty-somethings in the audience
give you only a blank look. If I say "like dBase" that means more to many people In a
world of relational databases the ISAM model is not taught or mentioned much in
textbooks any more.
ISAM stands for Indexed Sequential Access Method. There are many products around
that incorporate the word ISAM including Infomix's C-ISAM
http://www.informix.com/informix/techbriefs/cisam/cisam.htm and there is a
formal standard defining ISAM in the Open environment