Lexical Analysis
Volume Number: 6
Issue Number: 5
Column Tag: Language Translation
Ambiguities and Lexical Analysis 
By Clifford Story, Mount Prospect, IL
Part III. Ambiguities, Etc.
A. Introduction
This is the third part in my series on Language Translation. Language translation
has three phases: lexical analysis, parsing, and code generation. The first two parts
have dealt with building parsers using YACC; this is the final installment on that
subject.
That will take up only about half the article. The remainder will begin the new
topic of lexical analysis by presenting a skeleton filter tool.
B. Parsing
I now conclude parsing with a couple of miscellaneous topics. The first is a
simple way to handle ambiguities in a grammar. Then comes the horrible question of
error detection and reporting.
B(1). Ambiguities
Parser ambiguities, you may recall, are spots in the parse table where YACC’s
generation algorithm would place a shift and a reduction, or two reductions. A naive
grammar would generate an ambiguity when considering the input string
2 + 3 * 4
since it isn’t clear whether this means (2 + 3) * 4 or 2 + (3 * 4).
We faced this problem in the first part and solved it by re-writing the grammar
to eliminate the ambiguity. YACC offers another way of handling ambiguities (also
known as shift/reduce and reduce/reduce conflicts).
B(1)(a). YACC’s Default Rule
First, something you should know: YACC will work through any conflicts that it
finds and produce a non-ambiguous grammar for you. It resolves conflicts
automatically, and will only issues warnings, not errors.
Unfortunately, it does this by making assumptions about what you intended, and
these assumptions need not be correct. You can get a grammar that simply does not do
what you meant it to do, despite the lack of error messages. In my opinion, this is a
design flaw in YACC; conflicts should be fatal errors.
Even worse, MACYACC (see the first part for a review) uses an inordinately
complicated rule for resolving conflicts. Unix YACC uses a very simple rule, so you
can guess what it’s going to do. MACYACC, no such luck.
Therefore, YOU SHOULD ALWAYS REACT TO CONFLICT WARNINGS AS IF THEY
WERE FATAL ERRORS!
B(1)(b). Operator Precedence
Remember, conflicts arise when the parser doesn’t know which operation to
perform first, as in the 2 + 3 * 4 example. We intelligent humans know that
multiplication comes before addition: multiplication has a higher precedence than
addition. The conflict is easily and correctly resolved if YACC knows the order of
precedence among all operators in the grammar.
All that need be done is tell YACC:
/* 1 */
%left ‘+’ ‘-’
%left ‘*’ ‘/’
Multiplication and division have the same precedence (because they are on the
same line), and higher precedence than addition and subtraction (because their line is
after addition and subtraction’s line). That solves the problem (read on for an
explanation of the “left” business).
But what about strings like
2 - 3 + 4
Is that 2 - (3 + 4) or (2 - 3) + 4? The above rules don’t seem to say, since
subtraction and addition have the same precedence.
B(1)(c). What is Associativity?
Good question. In a string of the form
... • ID • ...
where ‘•’ is an operator and ‘ID’ an identifier, the • to the left of the ID will have a
higher precedence than the • to the right if • is left-associative; and conversely.
Thus,
2 - 3 - 4
is conventionally -5 because subtraction is left-associative. If it were
right-associative, 2 - 3 - 4 would equal 2 - (3 - 4), or 3. Just about everything is
left-associative.
So associativity defines order of evaluation among operators on the same level of
operator precedence. And the line
%left ‘+’ ‘-’
says that addition and subtraction are left-associative among one another.
YACC also includes keywords %right, which means just what you might think,
and %nonassoc, which means constructs of the form
... • ID • ...
are illegal (think of logical operators).
B(1)(d). The Lonely Minus...
There’s one more thing to worry about: unary operators. The classic example is
the minus sign. Do you remember my asking why a hex calculator was easier to write
than a decimal calculator? The answer is that a hex calculator doesn’t have to deal
with unary minus...
The problem arises because the minus sign is used for both unary and binary
operators (negation and subtraction). When we assign it a precedence in the %left
statement, we’re thinking about subtraction, so we give it a lower precedence than
multiplication and division. But negation should have the same precedence as
multiplication and division.
The solution is to give a grammar rule a precedence. This can be done with the
%prec keyword:
expr : ‘-’ expr %prec ‘*’
gives this rule the same precedence as multiplication. Thus, unary minus will have
the right precedence, and everything is at last conflict-free.
B(1)(e). An Example
%token NUM
%left ‘+’ ‘-’
%left ‘*’ ‘/’
%left MINUS
%%
prob : expr ‘\n’
{
printf(“\t= %d\n”, $1);
return(0);
}
;
expr : expr ‘+’ expr
{
$$ = $1 + $3;
}
| expr ‘-’ expr
{
$$ = $1 - $3;
}
| expr ‘*’ expr
{
$$ = $1 * $3;
}
| expr ‘/’ expr
{
$$ = $1 / $3;
}
| ‘-’ expr %prec MINUS
{
$$ = - $2;
}
| ‘(‘ expr ‘)’
{
$$ = $2;
}
| NUM
{
$$ = $1;
}
;
%%
/***************************************/
#include “stdio.h”
#include “ctype.h”
#include “string.h”
/***************************************/
char *input;
char *token;
/***************************************/
#define yyerror(x)
{
printf(“\t%s [%s]\n”, x, token);
return(0);
}
yyparse();
int yylex();
/***************************************/
void main(int argc, char *argv[])
{
char thestring[256];
if (argc < 1)
printf(“\tImpossible error!\n”);
else if (argc > 2)
printf(“\tHey! One at a time!\n”);
else if (argc == 2)
{
input = argv[1];
yyparse();
}
else
{
printf(“? “);
while (strlen(gets(thestring)) > 2)
{
input = &thestring[2];
yyparse();
printf(“? “);
}
}
}
/***************************************/
int yylex()
{
token = strtok(input, “ “);
input = 0;
if (token == 0)
return(‘\n’);
else if (sscanf(token, “%d”, &yylval) == 1)
return(NUM);
else
return(token[0]);
}
/***************************************/
B(2). Catching Errors
You may have noticed that Pascal compilers write real error messages, while C
compilers write error messages more cryptic than C itself. There’s a reason for this:
Pascal compilers use recursive-descent parsers, while C compilers use table-driven
parsers. Recursive-descent parsers are the sort of parsers a person might naturally
write: get the next token; if it’s a ‘+’, do this, else get the next token and do that, and
so on. A location in the code of the parser corresponds to a particular grammar