All Databases Frameworks Archive

January 93 - Persistent Objects and Object-Oriented Databases for C++

Persistent Objects and Object-Oriented Databases for C++

Dirk Bartels and Jonathan Robie

SIGS Publications, 588 Broadway, NY, NY; 212/274-0640; 212/274-0646 (fax).

Many programmers first learn about object-oriented programming by buying a C++

compiler and a GUI class library. These libraries are generally well structured,

support true object-oriented design and offer a simple, powerful programming

interface. This is often enough to convince programmers to use true object-oriented

designs. Unfortunately, many of these programmers find themselves tearing their

designs apart when they integrate their programs with conventional database systems,

which provide almost no support for objects or for expressing the relationships among

objects.

Conventional databases are good at managing large amounts of data, sharing data among

programs, and fast value-based queries. They are not very good at modeling the

relationships among data, however, since everything must be represented as series of

a two-dimensional tables.

Object-oriented database systems (OODBS) are a relatively new tool for software

developers. Unlike relational and table-oriented systems, they provide full support

for the object-oriented programming model used in languages like C++ and Smalltalk.

This model is intuitive, good at modeling relationships, and very suitable for large

software projects.

An object-oriented database combines the semantics of an object-oriented

programming language with the data management and query facilities of a conventional

database system. This makes it easy to manage large amounts of data and to model the

relationships among the data. If an object-oriented database is integrated with an

object-oriented language, it should support the semantics of that language;

relationships established in the program should automatically be represented in the

database when objects are stored.

This article focuses on the advantages of object-oriented databases over conventional

table-oriented and relational databases and the integration of an OODBS into C++. For

small applications these advantages mean that your program will be less complex and

easier to understand. For large or complex applications these advantages may mean the

difference between success and failure.

LIMITS OF CONVENTIONAL DATABASE SYSTEMS

Database systems are designed for managing large amounts of data, and they provide

many important features that object-oriented programming languages do not:

permanent storage, fast queries, sharing of objects among programs, device

independent formats, and sophisticated error handling for database operations.

Relational database systems (RDBS) and table-oriented systems based on B-Tree or

Indexed Sequential Access Method (ISAM) are the standard systems currently used in

most software development. Each requires that all data be portrayed as a series of

two-dimensional tables. The relational model declares the structures, operations, and

design principles to be used with these tables.

These systems are quite appropriate for some applications and were a real

breakthrough in their time, but software developers are rapidly learning that life is

not a series of two-dimensional tables. The growing complexity of modern programs

and the increasing use of dynamic data models have pushed traditional databases to

their limit. The limited data models they support can result in significant software

development costs since they do not allow program designs that closely match the

problem domain. They are not even worth considering for some application areas like

computer-aided design (CAD), computer-aided engineering (CAE), multimedia, and

office automation.

Limited Data Types

Modern software systems often contain data types that are not easily modeled using

such predeclared types. For example, a CAD program might have an array of shapes, or

a desktop publishing program might model a page as a series of frames which may

contain bitmaps, paragraphs, or vector drawings. We have already seen that

object-oriented programs allow us to declare new data types as needed.

Conventional databases have a fixed set of data types. The better systems include both

simple data types like INTEGER, FLOAT, or CHAR and complex data types like DATE,

TIME, or CURRENCY. New data types cannot be added by the user. If your database does

not have the data type you need then you are stuck. Aggregate data types like arrays are

rarely available. The only way to group data is to put it in a table.

Limited Modeling of Data Relationships

In conventional database systems, each item is represented as a row in a table. Tables

may be accessed sequentially or by searching for values. The only way to express

relationships among items is by setting values in the rows. In each table one or more

columns is chosen as the primary key; this key must be unique for each row in the

table. For instance, the primary keys for a student, a teacher, and a class might each

be represented as identification numbers.

The relational model is weak when showing many-to-one relationships, which

generally require the introduction of a new table. In our example, the only way to show

which students are taking a class is to create an "enrollment" table which has a row

for each student and contains the student identification number and class identification

number in each row.

Since relational databases have no concept of hierarchy, it is difficult to model the ISA

relationship. Suppose we have a "people" table, a "students" table, and a "teachers

table. Every student is also a person, and some of his fields are in each table. To update

all of a student's information you must find the rows of each table whose identification

numbers match. Every level in the hierarchy requires a new table, and every program

using the database must update every relevant table appropriately. The hierarchy is

not explicitly represented in the database; you simply have to know why the various

tables are there.

No Way of Grouping Code With Data

We have already seen that object-oriented programming languages allow related code

and data to be combined to form objects. There is no way to do this in a conventional

database system. If you know the name of a table you may use it, and the system will

not prevent you from changing the wrong table. As long as you have the right password

everything in the database is globally accessible to all of your code.

Limited Manipulation of Data

Database languages are often very poor at manipulating data. SQL, for instance, does

not allow you to perform computations on your data as input to a query, nor does it

allow you to perform computations on the result of a query. A computer language

designer would say that SQL is not computationally complete even though it is

relationally complete; a normal human being might say that SQL is great for searching

but lousy for anything else. Because of this, most serious applications are written in

conventional programming languages using some kind of SQL-based interface to the

database.

Poor Integration

Since the database and the host programming language use two different models and

different data types the programmer must either perform all operations directly in

the database or constantly convert between the two systems. The first method does not

let the programmer use many features of the host language; the second means a great

deal of overhead and frustration since the relationships among data must be constantly

converted to support both programming models. Such a program has two distinct

designs, one for the program itself and one for the database.

Summary

To store data in a conventional database, it must be dissected into a series of

two-dimensional tables. Only predeclared data types are supported. Object-oriented

programming languages have a rich set of features for creating data types and

representing the relationships among data that are not supported in such databases. In

the rest of this article, we discuss features that an object-oriented database must

support. To illustrate these features, we examine POET, a commercial object-oriented

database system with which we are connected.

POET

Persistence

The original implementation of Smalltalk had a simple method for storing objects: the

program's entire memory image could be dumped to disk and restored when running

the program later. This scheme has some real advantages. It is very simple to

implement, requires almost no effort from the programmer, and fully implements all

aspects of the programming language (after all, the program sticks everything in

memory somewhere!). It also has some real disadvantages. The number of objects that

can be stored depends on the amount of main memory available, the programming

context must be stored and retrieved as a whole, objects may not be shared among

programs or retrieved on another kind of computer, and there is no way to implement

intelligent error recovery.

Referenced by (3):