All Databases MacTech Vol 08-1992

Math Compiler

Volume Number: 8

Issue Number: 3

Column Tag: C Workshop

A Compiler for Math Equations

We’ve seen mathematical equation interpreters, what about a compiler?

By Kevin Raner, Mt. Waverley, Victoria, Australia

I am a scientist with an interest in Macintosh programming, and I a spend a large

proportion of my spare time writing number crunching programs. I set out to write an

application to use various algorithms to fit specific mathematical functions to some of

my experimental data. I didn’t want to hard code these equations into my application as I

thought it would be nice to have one version that would handle any equation you gave it.

The first version of my program took the equation as a string of characters and stored it

away. Then whenever it evaluated the equation, it parsed the string and did the

appropriate calculations as it went along. However, this had one big disadvantage,

namely it was very slow! Since what I had written was essentially an equation

interpreter, the obvious way to improve things was to write an equation compiler.

This article describes such a compiler for mathematical equations. The code,

written in THINK C, is designed to be incorporated into a project that needs to evaluate

an equation in order to perform some useful task, e.g., optimization, least squares

curve fitting, etc. When invoked, the compiler parses a string of characters and sets

up a block of executable code in the heap. In this manner, the code for evaluating the

equation can be compiled at run time. Consequently, it runs as quickly as if it had been

hard coded in the first place. Hence an application can be developed without prior

knowledge of what equation it might have to deal with.

An Example

The following code fragment illustrates the use of the equation compiler:

/* 1 */
#include "EqnCompiler.h
extended (*eqnFn)(); /* func ptr to code */
Handle eqnCode; /* handle to code block */
main()
{
 char eqnText[80];
 int theErr, index;
 extended x, y, a, b, c, coeff[5];
 /* allocate memory */
 eqnCode = NewHandle(0);
 /* equation text to be compiled */
 strcpy(eqnText, "eqnFn(x) = a*sin(b*x+c)");
 /* compile right hand side of equation */
 index = 11;
 theErr = CompileEqn(eqnText, strlen(eqnText),
 &index, eqnCode);
 HLock(eqnCode); /* lock code block */
 eqnFn = *eqnCode; /* set function ptr */
 coeff[0] = a; /* set coefficients */
 coeff[1] = b;
 coeff[2] = c;
 y = (*eqnFn)(x, coeff); /* call function */
}

The above fragment of code allocates a block of heap space to accommodate the new

function. Next it sets up a string of characters that defines the equation as text.

CompileEqn() is then called with a pointer to the text, the length of the text, the

location of the beginning of the equation, and the handle to a relocatable block on the

heap where the new function will reside. CompileEqn() expects the code block to be

unlocked so that it may grow in size as the code is compiled. (If the equation text is

relocatable, it should be locked in memory so that the pointer remains valid.) The

third argument, &index, is the address of an index variable. When the routine is

called, index informs the routine where to start parsing (position zero indicates the

first character in the buffer). In the above example the first character to be parsed is

‘a’. The text buffer is parsed until the end is reached, a semicolon is found, or a syntax

error is encountered. The use of a semicolon to mark the end of the equation allows

comments to be appended. On the function’s return, index will have been incremented

by the number of characters that were successfully parsed. If the text cannot be

compiled due to a syntax error, the nature of the error is returned in theErr and its

position is indicated in index. The possible errors that can be returned are listed in the

file “EqnCompiler.h”. When the code block has been set up, it is locked and

dereferenced to give the function pointer. Thereafter (*eqnFn)(x, coeff) is

functionally identical to the following function:

/* 2 */
extended eqnFn(extended x, extended *coeff)
{
 extended y;
 /* eqnFn(x) = a*sin(b*x+c) */
 y = coeff[0]*sin(coeff[1]*x+coeff[2]);
 return y;
}

The equation compiler always generates code that has the prototype shown above.

The new function expects two arguments, x and a pointer to an array of coefficients.

CompileEqn() recognizes five coefficients namely a, b, c, d, and e. If these coefficients

are referenced in the equation text then their numerical values should be placed in an

array before the function is called. A pointer to this array is passed as the second

argument, NULL may be passed if you are sure that the equation text does not contain

references to the coefficients. If you require more than five coefficients, you can edit

the routine ScanFn() to handle extra coefficients.

Equation Syntax

The equation compiler recognizes the following operands: a, b, c, d, e, x, π, pi,

and numeric strings. The numeric strings may be expressed in scientific or

non-scientific format. Unary minus and the following binary operators are allowed: +

(addition), - (subtraction), * (multiplication), / (division) and ^ (exponentiation).

The compiler recognizes the following elementary functions: sin(), cos(), tan()

atan(), sqrt(), exp() and log() (natural). Custom unary functions may also be

defined, the file “CustomFn.c” implements the functions asin() and acos(). The

compiler is not case sensitive and space characters inserted between operands,

operators, and functions are ignored. Parentheses may be used to change the priority of

operations, and comments may be appended to the end of an equation using the semicolon

delimiter, e.g., “2.1*(1-exp(-a*x)); first order growth”. The relative priorities of

the operators are shown below.

Priority of Operators

( ) (parentheses)

^ (exponentiation)

- (unary minus)

* / (multiplication, division)

+ - (addition, subtraction)

How it Works

I’m not going to describe this in great detail; however, I will describe the roles

of the various routines that make up the compiler. You’ll need to be familiar with

assembly language to understand the routines that do the actual coding. The first

routine called is CompileEqn(); this initializes the global variables and coordinates the

other routines which perform specific tasks. All of these routines notify CompileEqn()

of their success by returning an integer error code. CompileEqn() parses the text

buffer by calling any of three scanning routines: ScanNum(), ScanFn() and ScanOp().

The global variable mode controls which routine is called next. There are two modes,

SCAN_OPERAND and SCAN_OPERATOR. When mode is set to SCAN_OPERAND it expects

the current location in the text to contain either a numerical substring; an operand

such as x, c, π etc.; a left hand parenthesis; or a substring denoting an external

function, e.g., sin(x). The first situation is handled by ScanNum() which returns the

number, in extended format, by address. The variable operand is then set to

CONST_OPERAND to indicate that a numerical value has been stored in the variable

num. ScanFn() deals with the other possibilities and returns an integer result, by

address. If ScanFn() finds an operand, operand is set to either X_OPERAND,

COEFF_OPERAND or PI_OPERAND to indicate what was found. If a coefficient was found,

its offset from the start of the array coeff is stored in the variable offset (e.g. c is

offset 20 bytes from the start of the array). If π was found, num is set to the value of

pi as determined by the SANE function pi(). If ScanFn() finds a unary function, e.g.,

unary minus, sin(x) etc., the integer result contains information about the unary

operation (see below).

If mode is SCAN_OPERATOR, ScanOp() is called. This scans for a binary

operator or a right hand parenthesis and returns an integer result, by address. All

operations are ranked in an order of priority and are executed in order. The ranking

depends on the priority of an operation and on how deeply it’s embedded in parentheses.

Details about the pending operations are stored in the array operation[] as long

Referenced by (2):