Math Compiler
Volume Number: 8
Issue Number: 3
Column Tag: C Workshop
A Compiler for Math Equations
We’ve seen mathematical equation interpreters, what about a compiler?
By Kevin Raner, Mt. Waverley, Victoria, Australia
I am a scientist with an interest in Macintosh programming, and I a spend a large
proportion of my spare time writing number crunching programs. I set out to write an
application to use various algorithms to fit specific mathematical functions to some of
my experimental data. I didn’t want to hard code these equations into my application as I
thought it would be nice to have one version that would handle any equation you gave it.
The first version of my program took the equation as a string of characters and stored it
away. Then whenever it evaluated the equation, it parsed the string and did the
appropriate calculations as it went along. However, this had one big disadvantage,
namely it was very slow! Since what I had written was essentially an equation
interpreter, the obvious way to improve things was to write an equation compiler.
This article describes such a compiler for mathematical equations. The code,
written in THINK C, is designed to be incorporated into a project that needs to evaluate
an equation in order to perform some useful task, e.g., optimization, least squares
curve fitting, etc. When invoked, the compiler parses a string of characters and sets
up a block of executable code in the heap. In this manner, the code for evaluating the
equation can be compiled at run time. Consequently, it runs as quickly as if it had been
hard coded in the first place. Hence an application can be developed without prior
knowledge of what equation it might have to deal with.
The following code fragment illustrates the use of the equation compiler:
/* 1 */
#include "EqnCompiler.h
extended (*eqnFn)(); /* func ptr to code */
Handle eqnCode; /* handle to code block */
main()
{
char eqnText[80];
int theErr, index;
extended x, y, a, b, c, coeff[5];
/* allocate memory */
eqnCode = NewHandle(0);
/* equation text to be compiled */
strcpy(eqnText, "eqnFn(x) = a*sin(b*x+c)");
/* compile right hand side of equation */
index = 11;
theErr = CompileEqn(eqnText, strlen(eqnText),
&index, eqnCode);
HLock(eqnCode); /* lock code block */
eqnFn = *eqnCode; /* set function ptr */
coeff[0] = a; /* set coefficients */
coeff[1] = b;
coeff[2] = c;
y = (*eqnFn)(x, coeff); /* call function */
}
The above fragment of code allocates a block of heap space to accommodate the new
function. Next it sets up a string of characters that defines the equation as text.
CompileEqn() is then called with a pointer to the text, the length of the text, the
location of the beginning of the equation, and the handle to a relocatable block on the
heap where the new function will reside. CompileEqn() expects the code block to be
unlocked so that it may grow in size as the code is compiled. (If the equation text is
relocatable, it should be locked in memory so that the pointer remains valid.) The
third argument, &index, is the address of an index variable. When the routine is
called, index informs the routine where to start parsing (position zero indicates the
first character in the buffer). In the above example the first character to be parsed is
‘a’. The text buffer is parsed until the end is reached, a semicolon is found, or a syntax
error is encountered. The use of a semicolon to mark the end of the equation allows
comments to be appended. On the function’s return, index will have been incremented
by the number of characters that were successfully parsed. If the text cannot be
compiled due to a syntax error, the nature of the error is returned in theErr and its
position is indicated in index. The possible errors that can be returned are listed in the
file “EqnCompiler.h”. When the code block has been set up, it is locked and
dereferenced to give the function pointer. Thereafter (*eqnFn)(x, coeff) is
functionally identical to the following function:
/* 2 */
extended eqnFn(extended x, extended *coeff)
{
extended y;
/* eqnFn(x) = a*sin(b*x+c) */
y = coeff[0]*sin(coeff[1]*x+coeff[2]);
return y;
}
The equation compiler always generates code that has the prototype shown above.
The new function expects two arguments, x and a pointer to an array of coefficients.
CompileEqn() recognizes five coefficients namely a, b, c, d, and e. If these coefficients
are referenced in the equation text then their numerical values should be placed in an
array before the function is called. A pointer to this array is passed as the second
argument, NULL may be passed if you are sure that the equation text does not contain
references to the coefficients. If you require more than five coefficients, you can edit
the routine ScanFn() to handle extra coefficients.
Equation Syntax
The equation compiler recognizes the following operands: a, b, c, d, e, x, π, pi,
and numeric strings. The numeric strings may be expressed in scientific or
non-scientific format. Unary minus and the following binary operators are allowed: +
(addition), - (subtraction), * (multiplication), / (division) and ^ (exponentiation).
The compiler recognizes the following elementary functions: sin(), cos(), tan()
atan(), sqrt(), exp() and log() (natural). Custom unary functions may also be
defined, the file “CustomFn.c” implements the functions asin() and acos(). The
compiler is not case sensitive and space characters inserted between operands,
operators, and functions are ignored. Parentheses may be used to change the priority of
operations, and comments may be appended to the end of an equation using the semicolon
delimiter, e.g., “2.1*(1-exp(-a*x)); first order growth”. The relative priorities of
the operators are shown below.
Priority of Operators
( ) (parentheses)
^ (exponentiation)
- (unary minus)
* / (multiplication, division)
+ - (addition, subtraction)
How it Works
I’m not going to describe this in great detail; however, I will describe the roles
of the various routines that make up the compiler. You’ll need to be familiar with
assembly language to understand the routines that do the actual coding. The first
routine called is CompileEqn(); this initializes the global variables and coordinates the
other routines which perform specific tasks. All of these routines notify CompileEqn()
of their success by returning an integer error code. CompileEqn() parses the text
buffer by calling any of three scanning routines: ScanNum(), ScanFn() and ScanOp().
The global variable mode controls which routine is called next. There are two modes,
SCAN_OPERAND and SCAN_OPERATOR. When mode is set to SCAN_OPERAND it expects
the current location in the text to contain either a numerical substring; an operand
such as x, c, π etc.; a left hand parenthesis; or a substring denoting an external
function, e.g., sin(x). The first situation is handled by ScanNum() which returns the
number, in extended format, by address. The variable operand is then set to
CONST_OPERAND to indicate that a numerical value has been stored in the variable
num. ScanFn() deals with the other possibilities and returns an integer result, by
address. If ScanFn() finds an operand, operand is set to either X_OPERAND,
COEFF_OPERAND or PI_OPERAND to indicate what was found. If a coefficient was found,
its offset from the start of the array coeff is stored in the variable offset (e.g. c is
offset 20 bytes from the start of the array). If π was found, num is set to the value of
pi as determined by the SANE function pi(). If ScanFn() finds a unary function, e.g.,
unary minus, sin(x) etc., the integer result contains information about the unary
operation (see below).
If mode is SCAN_OPERATOR, ScanOp() is called. This scans for a binary
operator or a right hand parenthesis and returns an integer result, by address. All
operations are ranked in an order of priority and are executed in order. The ranking
depends on the priority of an operation and on how deeply it’s embedded in parentheses.
Details about the pending operations are stored in the array operation[] as long