All Databases MacTech Vol 14-1998

Getting To Know c-tree Plus

Volume Number: 14

Issue Number: 11

Column Tag: Tools Of The Trade

Getting to Know c-tree Plus

by Andy Dent BSC MACS AACM

FairCom Server - the invisible database engine

Let's say you are such a fan of MacTech Magazine that you can't wait for normal

delivery and have it sent Federal Express. You just used a FairCom Server (Federal

Express incorporate the FairCom Server into their routing boxes). If you phoned in

your order through an Alcatel exchange, you probably dealt with another FairCom

Server and if (horrors) you are sitting in front of an Intel-powered computer then

right now you are also benefiting from a FairCom Server (used to run Intel's

production line). Although the FairCom Server is a staple of many Fortune 500

companies, the non-programming public rarely hears of it (or at least not as much as

Oracle, Sybase or even MS Access). From the company's start, nearly 20 years ago,

FairCom has gone about their business quietly providing embedded file management

tools to C programmers.

In this article I want to introduce you to c-tree Plus, the C language programming tool

for working with FairCom Server. I'll talk a little about the strengths and limitations

of that tool, so you can decide if it's right for your next project. After that, I want to

describe my own experiences in developing a C++ wrapper layer for c-tree Plus

called OOFILE and explain how OOFILE can work with c-tree to offer a more complete

programming solution.

The Basics of c-tree Plus

When you buy a c-tree Plus developer kit, you receive a cross-platform source code

product (Macintosh, Unix, OS/2 and that other operating system). The package allows

single and multi-user (file-sharing) deployment - all royalty-free. You also receive

a developer license to run the FairCom Server; but if you want to deploy the server

(rather than use file-sharing) you need to purchase servers for each site you deploy

to. FairCom's servers are reasonably priced by comparison with most and run on a

wide range of platforms from DOS (honestly, even DOS) to the most powerful Unix

workstations, and of course - the Macintosh. FairCom's web lists more platforms and

includes a comment that c-tree Plus runs on over one hundred operating

system/hardware combinations.

Models of Data Management

In the last couple of years the range of c-tree Plus deployment models has expanded to

include linkable servers for combining your application code with the FairCom

Server, multi-threaded variants and the LOCLIB model which allows simultaneous use

of servers and local single-user files. The server also now has two Java interfaces,

one of which is RMI.

It is unusual for a client-server product to include a shared-file multi-user model.

The shared-file mode of deployment is certainly attractive for producing shrink-wrap

applications (both because it allows royalty-free distribution and also because it

doesn't require your customers to run a server) and you should consider it. However,

there are significant trade-offs that you need to be aware of. First, on a security basis,

if your files are visible on a file server (to allow sharing) then they are accessible

through other means. The recent version 6.8 of c-tree Plus adds encryption - but

there is still the danger that someone could access the files using the ODBC driver.

However, if you use the server, password protection on the server controls ODBC

connections.

There is also the general performance issue of client-server vs. shared file models.

The client-server model is more efficient. With complex data manipulation, using

shared files generates many network operations. Even index searches still involve

several data retrievals depending on the size and depth of the index. Batch functions in

c-tree Plus mean operations such as deletion or searching for and retrieving many

record pointers are handled by a single server call. A server will also cache records

centrally whereas the calls to the library for the shared file must immediately write

all data back to the disk, and cannot cache values because of possible changes by other

users.

Finally, the single-user and client-server models of operation provide transaction

logging and data recovery. With shared files there is no central point controlling

access to the data and so transactions are not supported. This makes it possible for a

series of operations to be halted partway, causing database inconsistency if not

outright corruption.

Understand, these issues are not c-tree specific but a general flaw with shared files,

and well-known amongst users of other databases such as Microsoft's Jet Engine (that

comes with Access and Visual BASIC). These are issues you'll need to consider when it's

time for you to choose which data management model is right for your next project.

The Pleasure of being Well-Connected

FairCom supports the Macintosh True Believer. The Macintosh servers allow you to

communicate using both TCP/IP and AppleTalk protocols simultaneously. The same

applies to the client libraries, so a Macintosh application could be interacting with

both a local server on your AppleTalk network and out onto the Internet to a server

running on Unix or Windows NT. Unlike some other servers, an application can open

connections to many FairCom servers simultaneously. Each connection has its own

security and file contexts.

Other server models support appropriate ranges of communication protocols, such as

NetBIOS on Windows networks. One of the most interesting is the shared memory

server for Unix. This provides ultra-high speed communications when the only clients

are processes running on the same machine as the server.

Which End is Up?

This is a critical question for an Australian writing to a predominantly US audience!

More seriously, if you've ever tried to store data in files moved between platforms,

you will have run into the problem that Macintosh processors are Big-Endian like

SPARC RISC chips whilst the Intel world is Little-Endian. FairCom refer to these as

HIGH_LOW and LOW_HIGH formats respectively.

Ignoring this problem is commonly known as playing Cowboys and Endians with your

integers and is usually fatal. Some (all?) PowerPC models can have their endian

orientation set but in the Macintosh world of course are used in the same orientation as

the 68K family.

The c-tree Plus libraries have two ways to solve the endian problem, depending on

which model of database you are using. If you're using the FairCom Server, the data is

stored in the native format of the server machine. If you're using the single-user or

shared-file libraries you must build your libraries with the UNIFRMAT flag which

stores data in the Intel format. In either case, the data delivered to your application

code matches the platform on which you are running.

Mixing models and platforms has one problem with file compatibility. If a database is

used with the FairCom Server on a Mac or SPARC server it will be in the native

format, so can't be copied to a PC. A workaround is to use the LOCLIB model to have a

PC application copy data from the server and write it out to a single-user local

database.

In contrast to server file format issues, using the UNIFRMAT model allows you to copy

documents around regardless of platform. This suits the document model of many

desktop applications.

Although c-tree Plus is described primarily as a record-oriented engine and

indifferent to the contents of your records, to manage endian conversion you need to

supply additional information to the database engine. This brings us to the Data Object

Definition Array.

Doobie, doobie DODA

There are a number of reasons why you might want to supply a record schema to

c-tree Plus, so it knows the fields within your record and not just their total length.

1. As described above, you need the library to know where binary fields such

as integers are stored, so it can swap bytes depending on the platform.

2. FairCom offers a character-oriented report-writer product, r-tree,

which allows you to specify field names in calculation scripts and design a

report by field name

3. The ODBC driver for general access to the database requires a schema.

(Sadly, there is only a Windows version of the driver available.)

4. The functions for specifying indexes when creating the database allow you

to specify fields using the DODA - a less error-prone method than specifying

in terms of record offsets.

The DODA provides a list of field definitions that include the data type: a predefined set

of typical integer, string and floating point types. It is saved in a c-tree Resource (not

to be confused with the Macintosh variety) and so provides an embedded schema in the

data file that can be read by the products mentioned above, or your own code.

What is Fixed and what is Not

Variable length text is a pain in databases. One of the file modes you can choose is to

define some or all of your record as being variable length. This saves on storage. You

can still have indexes defined on variable length text fields, so if you need to search on

such fields but expect their size to vary widely then variable length records is a good

way to go. The DODA allows you to define fields with either 2 or 4 byte leading lengths

as well as strings with delimiters.

We have not used indexed variable length fields in OOFILE for our local consulting but

FairCom's implementation has a good reputation judging by the positive comments on

CompuServe.

A little, or a whole lotta Locking

One of the biggest problems with cross-platform database engines is locking.

Relational databases typically use page-locking which locks chunks of the file and can

inadvertently lock many records which should be available to other users. The

record-locking in c-tree Plus avoids this problem, and also provides for read locking

which allows many readers of a record, whilst banning any writes to change the

record. DOS and Windows don't support read locks natively in the operating system but

FairCom have a workaround. Their approach can be used to share a file on an NT server

between Mac and Windows users and have mutual locking.

The simplest use of locking in c-tree Plus is to leave locking solely to the library. You

can get away with this if write collisions are not expected - automatic locking on the

index files guarantees there won't be any corruption just because two people write

different records at the same time. More stringent locking is easily enabled with the

LockISAM call and you can choose between having locks acquired automatically for each

record accessed, or explicitly locking as you go.

Togetherness is not always a wonderful thing

If you programmed with 4th Dimension version 1, dBase or FoxPro, you will be used

to a database consisting of multiple files. This is inconvenient for applications,

although usually much more efficient. One significant benefit is that using a single file

for each database table allows optimal use of fixed-length records. This is one reason

why FoxPro and other xBase engines are so fast and why big mainframe databases used

to have fixed-length fields in ISAM.

Structuring your database as some or all individual files, or grouping data into

superfiles is very simple in c-tree Plus. The default behaviour is a single .dat and .idx

physical for each data file. The .idx file will therefore contain as many index trees as

there are indexes declared on the c-tree data file. However, you can choose to have

some index trees allocated to a different physical index file, if you wanted to separate a

particularly dynamic index from the others. A large and active index could even be

stored on its own disk, if you wanted to take advantage of disk-level caching in your

operating system.

A simple directory-like prefix is used to group multiple files into a superfile. Just

prefix the data file name with the superfile name, separated with a | character. For

example, if I wanted to open a superfile called 'school.db' which contained 'teachers'

and 'students' I would:

1. open the superfile 'school.db'

2. open the data file 'school.db|teachers'

3. open the data file 'school.db|students'

An implication of the simplicity of this scheme is that you can have multiple

superfiles open at once, which neatly satisfies the typical desktop application's

requirement to open multiple document. Even though the data files may have the same

names, the prefixing keeps them unique. In desktop terms, think of using different

folders to separate out identically named files.

Just because you create say twenty data files in a superfile, you are not forced to open

them every time you use that superfile. The directory idiom extends to just allowing

you to access a single data file, regardless of the original number or order creation.

The most significant downside to using superfiles is that they interweave all your data.

This means that database rebuilds are complicated, all records are inherently variable

length and so some performance advantages are lost. There is also an 18 byte header

for each record, which could be significant in very large databases with small records

(our largest user stores about 9 million records a month with separate files under

Unix - as our ISP it's in my interest to keep them happy).

The flexibility of the c-tree Plus model means however that you can keep temporary

files out of the superfile that is your main database. Many is the time I wanted to do

that with 4D, cringing at the thought of several hundred thousand temporary records

being created and deleted and fragmenting my single database.

Other Goodies

There are many features of c-tree Plus we don't have room to cover in detail. In the

spirit of completeness, some of the more powerful are:

• key compression for trailing bytes or leading common strings

• indexing only parts of fields, and easy building of compound keys

• indexes with custom collation (FairCom have a large European user base)

or reverse sort order

• mirroring data to another file

• skipping null values in the index

• conditional indexes, with a formula defined to filter membership of the

index

• transaction histories, making use of the transaction logfiles for auditing

all operations on a record and tracing an item through the files

• a portable threading model that can be used to provide threading across all

the environments supported by c-tree Plus.

That completes our brief tour of c-tree, and should give an idea of some of what the

FairCom solution provides. Next we'll take a look at OOFILE and see how c-tree and

OOFILE can work in concert to offer a unique and powerful solution for database

programmers.

It's easy to feel old when you mention ISAM and the twenty-somethings in the audience

give you only a blank look. If I say "like dBase" that means more to many people In a

world of relational databases the ISAM model is not taught or mentioned much in

textbooks any more.

ISAM stands for Indexed Sequential Access Method. There are many products around

that incorporate the word ISAM including Infomix's C-ISAM

http://www.informix.com/informix/techbriefs/cisam/cisam.htm and there is a

formal standard defining ISAM in the Open environment

Referenced by (6):