January 93 - Persistent Objects and Object-Oriented Databases for C++
Persistent Objects and Object-Oriented Databases for C++
Dirk Bartels and Jonathan Robie
Reprinted with permission from C++ Report September 1992. Copyright© 1992
SIGS Publications, 588 Broadway, NY, NY; 212/274-0640; 212/274-0646 (fax).
Many programmers first learn about object-oriented programming by buying a C++
compiler and a GUI class library. These libraries are generally well structured,
support true object-oriented design and offer a simple, powerful programming
interface. This is often enough to convince programmers to use true object-oriented
designs. Unfortunately, many of these programmers find themselves tearing their
designs apart when they integrate their programs with conventional database systems,
which provide almost no support for objects or for expressing the relationships among
objects.
Conventional databases are good at managing large amounts of data, sharing data among
programs, and fast value-based queries. They are not very good at modeling the
relationships among data, however, since everything must be represented as series of
a two-dimensional tables.
Object-oriented database systems (OODBS) are a relatively new tool for software
developers. Unlike relational and table-oriented systems, they provide full support
for the object-oriented programming model used in languages like C++ and Smalltalk.
This model is intuitive, good at modeling relationships, and very suitable for large
software projects.
An object-oriented database combines the semantics of an object-oriented
programming language with the data management and query facilities of a conventional
database system. This makes it easy to manage large amounts of data and to model the
relationships among the data. If an object-oriented database is integrated with an
object-oriented language, it should support the semantics of that language;
relationships established in the program should automatically be represented in the
database when objects are stored.
This article focuses on the advantages of object-oriented databases over conventional
table-oriented and relational databases and the integration of an OODBS into C++. For
small applications these advantages mean that your program will be less complex and
easier to understand. For large or complex applications these advantages may mean the
difference between success and failure.
LIMITS OF CONVENTIONAL DATABASE SYSTEMS
Database systems are designed for managing large amounts of data, and they provide
many important features that object-oriented programming languages do not:
permanent storage, fast queries, sharing of objects among programs, device
independent formats, and sophisticated error handling for database operations.
Relational database systems (RDBS) and table-oriented systems based on B-Tree or
Indexed Sequential Access Method (ISAM) are the standard systems currently used in
most software development. Each requires that all data be portrayed as a series of
two-dimensional tables. The relational model declares the structures, operations, and
design principles to be used with these tables.
These systems are quite appropriate for some applications and were a real
breakthrough in their time, but software developers are rapidly learning that life is
not a series of two-dimensional tables. The growing complexity of modern programs
and the increasing use of dynamic data models have pushed traditional databases to
their limit. The limited data models they support can result in significant software
development costs since they do not allow program designs that closely match the
problem domain. They are not even worth considering for some application areas like
computer-aided design (CAD), computer-aided engineering (CAE), multimedia, and
office automation.
Limited Data Types
Modern software systems often contain data types that are not easily modeled using
such predeclared types. For example, a CAD program might have an array of shapes, or
a desktop publishing program might model a page as a series of frames which may
contain bitmaps, paragraphs, or vector drawings. We have already seen that
object-oriented programs allow us to declare new data types as needed.
Conventional databases have a fixed set of data types. The better systems include both
simple data types like INTEGER, FLOAT, or CHAR and complex data types like DATE,
TIME, or CURRENCY. New data types cannot be added by the user. If your database does
not have the data type you need then you are stuck. Aggregate data types like arrays are
rarely available. The only way to group data is to put it in a table.
Limited Modeling of Data Relationships
In conventional database systems, each item is represented as a row in a table. Tables
may be accessed sequentially or by searching for values. The only way to express
relationships among items is by setting values in the rows. In each table one or more
columns is chosen as the primary key; this key must be unique for each row in the
table. For instance, the primary keys for a student, a teacher, and a class might each
be represented as identification numbers.
The relational model is weak when showing many-to-one relationships, which
generally require the introduction of a new table. In our example, the only way to show
which students are taking a class is to create an "enrollment" table which has a row
for each student and contains the student identification number and class identification
number in each row.
Since relational databases have no concept of hierarchy, it is difficult to model the ISA
relationship. Suppose we have a "people" table, a "students" table, and a "teachers
table. Every student is also a person, and some of his fields are in each table. To update
all of a student's information you must find the rows of each table whose identification
numbers match. Every level in the hierarchy requires a new table, and every program
using the database must update every relevant table appropriately. The hierarchy is
not explicitly represented in the database; you simply have to know why the various
tables are there.
No Way of Grouping Code With Data
We have already seen that object-oriented programming languages allow related code
and data to be combined to form objects. There is no way to do this in a conventional
database system. If you know the name of a table you may use it, and the system will
not prevent you from changing the wrong table. As long as you have the right password
everything in the database is globally accessible to all of your code.
Limited Manipulation of Data
Database languages are often very poor at manipulating data. SQL, for instance, does
not allow you to perform computations on your data as input to a query, nor does it
allow you to perform computations on the result of a query. A computer language
designer would say that SQL is not computationally complete even though it is
relationally complete; a normal human being might say that SQL is great for searching
but lousy for anything else. Because of this, most serious applications are written in
conventional programming languages using some kind of SQL-based interface to the
database.
Poor Integration
Since the database and the host programming language use two different models and
different data types the programmer must either perform all operations directly in
the database or constantly convert between the two systems. The first method does not
let the programmer use many features of the host language; the second means a great
deal of overhead and frustration since the relationships among data must be constantly
converted to support both programming models. Such a program has two distinct
designs, one for the program itself and one for the database.
Summary
To store data in a conventional database, it must be dissected into a series of
two-dimensional tables. Only predeclared data types are supported. Object-oriented
programming languages have a rich set of features for creating data types and
representing the relationships among data that are not supported in such databases. In
the rest of this article, we discuss features that an object-oriented database must
support. To illustrate these features, we examine POET, a commercial object-oriented
database system with which we are connected.
POET
Persistence
The original implementation of Smalltalk had a simple method for storing objects: the
program's entire memory image could be dumped to disk and restored when running
the program later. This scheme has some real advantages. It is very simple to
implement, requires almost no effort from the programmer, and fully implements all
aspects of the programming language (after all, the program sticks everything in
memory somewhere!). It also has some real disadvantages. The number of objects that
can be stored depends on the amount of main memory available, the programming
context must be stored and retrieved as a whole, objects may not be shared among
programs or retrieved on another kind of computer, and there is no way to implement
intelligent error recovery.