Tightwads Flat File db
Volume Number: 12
Issue Number: 12
Column Tag: Tools Of The Trade
A Tightwad’s Guide to Flat File Databases 
Working with relations in a flat file data base engine
by Edward Ringel, Waterville, Maine
Note: Source code files accompanying article are located on MacTech CD-ROM orsource code disks.
By Way of Introduction
One of the most naive, chutzpah-riddled tasks I ever set for myself was trying to
build an accounts receivable program for my medical practice in my spare time after
reading the Omnis 3 manual and tutorial. After spending a lot of time (and obviously
not an inconsiderable sum on this program) I gave up, got wise and bought Tess 2.0,
which we’ve used happily through to the present version of 4.4. It is interesting that
this fiasco propelled me 1) to program in a language, rather than a database, so that I
had complete control of the environment and 2) to think a bit about databases and
database design.
Programming to solve database problems using a general purpose computer
language inevitably turns one’s attention to database engines and source code libraries
to lighten the programming load. While the ads for many of these products promise all
kinds of relational muscle, a lot of serious programming tasks don’t need that kind of
horsepower. In the course of completing several programming tasks of interest to me,
I’ve developed some techniques for using “less powerful” flat file database engines that
may be of general interest. For some of you, this may be fairly elementary; for others
enlightening. Please bear with me.
Tools and Questions
When I try to solve any problem, I try to conceive the question as carefully as
possible and then look at the tools available that might be useful. For the purpose of
this discussion, let’s alter the normal thought process and start with the tools. The
high priced libraries are expensive for several good reasons.
These database engines are powerful. They can handle thousands, if not millions of
records. They can do wonderfully fast searches. They are optimized to read and write
keys and data as fast as the OS will permit. They do extensive error checking and error
correction. They permit the arbitrary changing of record size, key size, etc. They
permit the creation of relational tables and thus the creation of relational databases.
Some interface with object oriented paradigms.
Does Your Problem Really Need a Tool This Powerful?
Most of us write databases that handle hundreds or thousands of records, not
millions. Most of us write programs where very fast search, read, and write functions
are not critical because there is a user sitting there who can’t type that fast. I’m old
fashioned about error checking; I don’t feel as though I can ever turn down
responsibility for this aspect of programming. Good planning can obviate the need for
after-the-fact changes to database structure, and even better planning can allow for it.
Relational tables, while useful, are expensive in terms of efficiency and disk space,
and besides 1) a lot of really important information in our lives is flat in structure,
and 2) the data can be represented and interrelated with a simpler scheme than a
relational table. Object persistence is neat, and I don’t know how to write an object
persistence scheme; for all I know maybe you can’t come close to NeoAccess or its
cousins with what I’m going to propose. Maybe I’m way off base. However, nothing in
my thinking precludes a database of objects where constructor/initializer routines
reestablish object relationships as the objects are loaded.
Why a Flat File Database Library?
It is time inefficient, error prone, and hard to write a set of routines that
manages keys, records, and files. While it may be hard to justify the cost of some of the
high end products, it is equally hard to justify the enormous investment of time and
hassle in writing your own set of database routines. A flat file data base engine is an
excellent compromise between cost and features. This tool is reasonably efficient, and
is within most programmers’ budgets. It supplies you with the code that is the most
difficult to write. There will still be an investment of time and energy that will be
necessary, but it won’t be nearly as dear as starting from scratch. This kind of library
has three main features:
• A scheme for record allocation, deletion, and reuse of deleted record file space.
• A record indexing scheme. This is usually some sort of b-tree, but can be a hash
table or some other indexing system.
• Extensive error checking during record and key operations, and result codes that
indicate 1) whether the database engine did what you asked it to do and 2) if there
was an operating system I/O error.
These basic functions can be used very successfully to manage relatively complex
data sets. I am going to assume that anyone reading this article understands the basic
concepts of record and key operations.
I use a library called B-Tree Helper, from (m)agreeable software in Plymouth,
Minnesota (Magreeable@aol.com). The current version is 2.2. This product comes as C
source code. The function calls are straightforward. Mel Magree is delightfully old
fashioned in that he actually includes a hard copy manual for no extra charge! The product can handle fixed and variable length records, many indices, and is very
Mac-like in that a single physical file on disk can contain multiple index and record
types; no separate index and record file is needed for each logical grouping of data
(unless you wish to do so.)
Some of the earlier incarnations of B-Tree Helper had some clumsy function
calls and too many pointers to too many temporary results. The current version is
very nice and well worth the upgrade if you are an owner of an older version. This
product is used for the demonstration application. The source code provided for the
demo will compile only if you supply the .c files to B-Tree Helper; the .h files for the
library are provided. B-Tree Helper is a robust, good product and has met my needs
nicely. There are other flat file engines available, but this article is not a product
review. The reader can easily find and compare different products.
Invoicing Revisited
To make my points, and to demonstrate some techniques, I would like to revisit
the model problem of invoicing. This is a standard database problem that some of you
may have encountered previously in examples or classes. The classical invoicing
structure, at its simplest, contains four different types of records, each contained
within its own file. Figure 1 shows the basic interrelationships of data.
Figure 1.
There is a customer, a part, an invoice, and a line item. The customer record
contains data about the customer, how much he/she owes, etc. The parts record
contains information about each item being sold. The invoice is an open-ended
structure that tells us about who the customer is, and the various items sold to the
customer at that interaction. Each line item consists of the part, the number of parts
sold, the price, etc.
When there is a sale, the program would create an invoice, n number of line
items, and relate the invoice and line items to pre-existing records that represented
customer and part information. Each line item structure contains a reference to the
part information for that line item. This can be done either through a relational table,
or it may be done simply by embedding a reference to the part information in the line
item. When a line item record is retrieved, the application is written so that the
information about the part reference is retrieved.
I solve this problem without a relational database. The customer does not need to
have a direct relation to a line item; the invoice does that for him/her. Similarly, the
customer needs no relation to the parts catalog. With some planning, and understanding
of the information problem, a series of special indices and embedded links can be
constructed that give you all the data interrelations that you will need. In any setting
where the interrelationships are fixed, and the data queries are predictable and have
been anticipated, a combination of careful programming and good data structures are a
powerful data management system.
Don’t Scare Your Customers - Think Ahead
This kind of programming requires a lot of forethought. Unlike an interface, it is
hard to create data structures on the fly. It is critical to have a clear vision of the
problem and how you are going to solve it, because you are hard-coding data
relationships. It will be difficult for you, and frightening for your user (yes,
frightening) when you twiddle an existing file to “add some stuff.” Why frightening?