All Databases MacTech Vol 12-1996

Tightwads Flat File db

Volume Number: 12

Issue Number: 12

Column Tag: Tools Of The Trade

A Tightwad’s Guide to Flat File Databases

Working with relations in a flat file data base engine

by Edward Ringel, Waterville, Maine

Note: Source code files accompanying article are located on MacTech CD-ROM or

source code disks.

By Way of Introduction

One of the most naive, chutzpah-riddled tasks I ever set for myself was trying to

build an accounts receivable program for my medical practice in my spare time after

reading the Omnis 3 manual and tutorial. After spending a lot of time (and obviously

not an inconsiderable sum on this program) I gave up, got wise and bought Tess 2.0,

which we’ve used happily through to the present version of 4.4. It is interesting that

this fiasco propelled me 1) to program in a language, rather than a database, so that I

had complete control of the environment and 2) to think a bit about databases and

database design.

Programming to solve database problems using a general purpose computer

language inevitably turns one’s attention to database engines and source code libraries

to lighten the programming load. While the ads for many of these products promise all

kinds of relational muscle, a lot of serious programming tasks don’t need that kind of

horsepower. In the course of completing several programming tasks of interest to me,

I’ve developed some techniques for using “less powerful” flat file database engines that

may be of general interest. For some of you, this may be fairly elementary; for others

enlightening. Please bear with me.

Tools and Questions

When I try to solve any problem, I try to conceive the question as carefully as

possible and then look at the tools available that might be useful. For the purpose of

this discussion, let’s alter the normal thought process and start with the tools. The

high priced libraries are expensive for several good reasons.

These database engines are powerful. They can handle thousands, if not millions of

records. They can do wonderfully fast searches. They are optimized to read and write

keys and data as fast as the OS will permit. They do extensive error checking and error

correction. They permit the arbitrary changing of record size, key size, etc. They

permit the creation of relational tables and thus the creation of relational databases.

Some interface with object oriented paradigms.

Does Your Problem Really Need a Tool This Powerful?

Most of us write databases that handle hundreds or thousands of records, not

millions. Most of us write programs where very fast search, read, and write functions

are not critical because there is a user sitting there who can’t type that fast. I’m old

fashioned about error checking; I don’t feel as though I can ever turn down

responsibility for this aspect of programming. Good planning can obviate the need for

after-the-fact changes to database structure, and even better planning can allow for it.

Relational tables, while useful, are expensive in terms of efficiency and disk space,

and besides 1) a lot of really important information in our lives is flat in structure,

and 2) the data can be represented and interrelated with a simpler scheme than a

relational table. Object persistence is neat, and I don’t know how to write an object

persistence scheme; for all I know maybe you can’t come close to NeoAccess or its

cousins with what I’m going to propose. Maybe I’m way off base. However, nothing in

my thinking precludes a database of objects where constructor/initializer routines

reestablish object relationships as the objects are loaded.

Why a Flat File Database Library?

It is time inefficient, error prone, and hard to write a set of routines that

manages keys, records, and files. While it may be hard to justify the cost of some of the

high end products, it is equally hard to justify the enormous investment of time and

hassle in writing your own set of database routines. A flat file data base engine is an

excellent compromise between cost and features. This tool is reasonably efficient, and

is within most programmers’ budgets. It supplies you with the code that is the most

difficult to write. There will still be an investment of time and energy that will be

necessary, but it won’t be nearly as dear as starting from scratch. This kind of library

has three main features:

• A scheme for record allocation, deletion, and reuse of deleted record file space.

• A record indexing scheme. This is usually some sort of b-tree, but can be a hash

table or some other indexing system.

• Extensive error checking during record and key operations, and result codes that

indicate 1) whether the database engine did what you asked it to do and 2) if there

was an operating system I/O error.

These basic functions can be used very successfully to manage relatively complex

data sets. I am going to assume that anyone reading this article understands the basic

concepts of record and key operations.

I use a library called B-Tree Helper, from (m)agreeable software in Plymouth,

Minnesota (Magreeable@aol.com). The current version is 2.2. This product comes as C

source code. The function calls are straightforward. Mel Magree is delightfully old

fashioned in that he actually includes a hard copy manual for no extra charge! The

product can handle fixed and variable length records, many indices, and is very

Mac-like in that a single physical file on disk can contain multiple index and record

types; no separate index and record file is needed for each logical grouping of data

(unless you wish to do so.)

Some of the earlier incarnations of B-Tree Helper had some clumsy function

calls and too many pointers to too many temporary results. The current version is

very nice and well worth the upgrade if you are an owner of an older version. This

product is used for the demonstration application. The source code provided for the

demo will compile only if you supply the .c files to B-Tree Helper; the .h files for the

library are provided. B-Tree Helper is a robust, good product and has met my needs

nicely. There are other flat file engines available, but this article is not a product

review. The reader can easily find and compare different products.

Invoicing Revisited

To make my points, and to demonstrate some techniques, I would like to revisit

the model problem of invoicing. This is a standard database problem that some of you

may have encountered previously in examples or classes. The classical invoicing

structure, at its simplest, contains four different types of records, each contained

within its own file. Figure 1 shows the basic interrelationships of data.

Figure 1.

There is a customer, a part, an invoice, and a line item. The customer record

contains data about the customer, how much he/she owes, etc. The parts record

contains information about each item being sold. The invoice is an open-ended

structure that tells us about who the customer is, and the various items sold to the

customer at that interaction. Each line item consists of the part, the number of parts

sold, the price, etc.

When there is a sale, the program would create an invoice, n number of line

items, and relate the invoice and line items to pre-existing records that represented

customer and part information. Each line item structure contains a reference to the

part information for that line item. This can be done either through a relational table,

or it may be done simply by embedding a reference to the part information in the line

item. When a line item record is retrieved, the application is written so that the

information about the part reference is retrieved.

I solve this problem without a relational database. The customer does not need to

have a direct relation to a line item; the invoice does that for him/her. Similarly, the

customer needs no relation to the parts catalog. With some planning, and understanding

of the information problem, a series of special indices and embedded links can be

constructed that give you all the data interrelations that you will need. In any setting

where the interrelationships are fixed, and the data queries are predictable and have

been anticipated, a combination of careful programming and good data structures are a

powerful data management system.

Don’t Scare Your Customers - Think Ahead

This kind of programming requires a lot of forethought. Unlike an interface, it is

hard to create data structures on the fly. It is critical to have a clear vision of the

problem and how you are going to solve it, because you are hard-coding data

relationships. It will be difficult for you, and frightening for your user (yes,

frightening) when you twiddle an existing file to “add some stuff.” Why frightening?

Referenced by (3):