Chapter 6: Examples

Now we show and explain three sample programs written using bisonc++: a reverse polish notation calculator, an algebraic (infix) notation calculator, and a multi-function calculator. All three have been tested using the Gnu C++ compiler version 5.3.1. Each example results in a usable, though limited, interactive desk-top calculator.

These examples are simple, but bisonc++ grammars for real programming languages are written the same way. You can copy the examples from this document into source files to create your own versions of these programs. In addition, the bisonc++ source archive contains the various source files ready for use.

6.1: Rpn: a Reverse Polish Notation Calculator

The first example is about a simple double-precision reverse polish notation calculator (a calculator using postfix operators). This example provides a good starting point, since operator precedence is not an issue. The second example illustrates how operator precedence is handled.

All sources for this calculator are found in the demos/rpn/ directory.

6.1.1: Declarations for the `rpn' calculator

Here are the directives that are used for the reverse polish notation calculator. As in C++, end-of-line comments may be used.
%baseclass-preinclude cmath

%token NUM
%stype double

The grammar file's first (directive) configures bisonc++ by specifying the tokens (see section The bisonc++ Declarations Section). Each terminal symbol that is not a single-character literal must be declared here (Single-character literals are normally not declared, but are represented by literal character constants). In the current example, all arithmetic operators are designated by single-character literals, so the only terminal symbol that needs to be declared is NUM, the token type for numeric constants. As bisonc++ by default uses the type int as the semantic value type, but a calculator usually uses floating point values the directive %stype double is used to indicate that the calculator uses `double' as the type for its semantic values.

6.1.2: Grammar rules for the `rpn' calculator

Here are the grammar rules for the reverse polish notation calculator.
input:    
        // empty 
| 
        input line
;

line:   
        '\n'
| 
        exp '\n'  
        { 
            std::cout << "\t" << $1 << std::endl;
        }
;

exp:      
        NUM             
| 
        exp exp '+'     
        { 
            $$ = $1 + $2;    
        }
| 
        exp exp '-'     
        { 
            $$ = $1 - $2;    
        }
| 
        exp exp '*'     
        { 
            $$ = $1 * $2;    
        }
| 
        exp exp '/'     
        { 
            $$ = $1 / $2;    
        }
|
        // Exponentiation:
        exp exp '^'     
        { 
            $$ = pow($1, $2); 
        }
|
        // Unary minus:
        exp 'n'
        { 
            $$ = -$1;        
        }
;

The rules of the rpn `language' defined here are the expression (using the name exp), the line of input (line), and the complete input transcript (input). Each of these nonterminal symbols has several alternate rules, joined by the `|' separator, separating the various production rules. The various rules are explained next.

The semantics of the language are determined by the actions taken once nonterminals have been recognized. The actions consist of C++ code that appears inside braces. See section 4.6.2.

Actions are specified using C++, but bisonc++ provides the means for passing semantic values between rules. Refer to section 4.6.2 below for an extensive coverage of the various possibilities. As a short introduction, the pseudo-variable $$ can be used in rules to represent the semantic value for the nonterminal that the production rule is defining. Assigning a value to $$ is the main task of many actions. The semantic values of the elements of production rules are referred to as $1 (the first element of a production rule), $2 (the second element), and so on.

6.1.2.1: Explanation of `input'

Consider the definition of input:

    input:    
            // empty 
    | 
            input line
    ;
        
This definition should be read as follows: A complete input is either an empty string, or a complete input followed by an input line. Notice that `complete input' is defined in terms of itself. This definition is said to be left recursive since the rule's nonterminal (input) appears as the leftmost symbol in the production rule. See section 4.4.

The first alternative is empty: there are no symbols between the colon and the first `|'; this means that input can match an empty string of input (no tokens). We write the rules this way because it is legitimate to type Ctrl-d (end of input) immediately after starting the calculator. By convention empty alternatives are provided with the comment `// empty'.

The second production rule (`input line') handles all nontrivial input. It means after reading any number of lines, read one more line. The left recursion turns this rule into a loop: it processes any number of lines.

The parser's parsing function continues to process input until a grammatical error is encountered or untilthe lexical analyzer returns the end-of-file token.

6.1.2.2: Explanation of `line'

Next consider the definition of the line nonterminal:

    line:   
            '\n'
    | 
            exp '\n'  
            { 
                cout << "\t" << $1 << endl;
            }
    ;
        
The first alternative is a newline character token; this means that rpn accepts a blank line (and ignores it, since it has no associated action). The second alternative is an expression (expr), followed by a newline. This alternative handles all expressions that are entered by the user. The semantic value of the exp nonterminal is the value of $1 because the exp nonterminal is the first symbol of the production rule. The action simply prints the expression's value.

This action is unusual because it does not assign a value to $$. As a consequence, the semantic value associated with line is not defined. This becomes a bug if that value is ever used, but we don't use it: once rpn has printed the value of the user's input line, that value is no longer needed.

6.1.2.3: Explanation of `expr'

The exp nonterminal has several rules, one for each kind of expression. The first rule handles the simplest kind of expression: just a single number. The second handles additions, which are two expressions followed by a plus-sign. The third handles subtractions, and so on.

    exp:      
            NUM             
    | 
            exp exp '+'     
            { 
                $$ = $1 + $2;    
            }
    | 
            exp exp '-'     
            { 
                $$ = $1 - $2;    
            }
    ...
        
Normally the production rule separator `|' is used to separate the various production rules, but the rules could equally well have separately been defined:

    exp:      
            NUM             
    ;

    exp:
            exp exp '+'     
            { 
                $$ = $1 + $2;    
            }
    ;

    exp:
            exp exp '-'     
            { 
                $$ = $1 - $2;    
            }
    ;

    ...
        
Most rules have actions assoicated with them computing the value of the expression using the values of the elements of production rules. For example, in the rule for addition, $1 refers to the first component exp and $2 refers to the second one. The third component, '+', has no semantic value associated with it, but if it had you could refer to it as $3. When the parser's parsing function recognizes a sum expression using this rule, the sum of the two subexpressions' values is produced as the value of the entire expression. See section 4.6.2.

You don't have to define actions for every rule. When a rule has no action, bisonc++ by default copies the value of $1 into $$. This is what happens in the first rule (the one that uses NUM).

The formatting shown here shows the recommended layout, but bisonc++ does not require it. You can alter whitespace as much as you like.

6.1.3: The Lexical Scanner used by `rpn'

The lexical analyzer's job is to convert characters or sequences of characters read from the input stream into tokens. Bisonc++'s parser obtains its tokens by calling its lexical analyzer member (lex), which is a predefined member of the parser class. See section 5.3.1.

Our calculator only needs a simple lexical analyzer. It skips blanks and tabs, then reads numbers, returning them as NUM tokens. Any other character that isn't part of a number is a separate token. Note that the token-value for such a single-character token is the character itself.

The lexical analyzer function's return value is a numeric value representing a token. If the token is a literal character, then its numeric code is the character's ASCII code; if the token is the name of a token defined by bisonc++ in a grammar specification file, then such named tokens are defined by bisonc++ in a C++ enumeration. In the current example, therefore, NUM becomes an enumeration value, returned by the lex member as the `value' NUM.

Semantic values of nonterminals are stored in the parser's data member d_val_ (comparable to the variable yylval used by, e.g., Bison). This data member has int as its default type, but by specifying %stype in the grammar file's directive section this default type is modified (to, e.g., double).

A token value of zero is returned once end-of-file is encountered (the parsing function produced by bisonc++ interprets any nonpositive token value as end-of-input).

Here is the lexical scanner's implementation:

#include "Parser.ih"

//  Lexical scanner returns a double floating point 
//  number on the stack and the token NUM, or the ASCII
//  character read if not a number.  Skips all blanks
//  and tabs, returns 0 for EOF.

int Parser::lex()
{
    char c;
                                    // get the next non-ws character
    while (std::cin.get(c) && (c == ' ' || c == '\t'))
        ;

    if (!std::cin)                  // no characters were obtained
        return 0;                   // indicate End Of Input  
    
    if (c == '.' || isdigit(c))     // if a digit char was found
    {
        std::cin.putback(c);        // return the character
        std::cin >> d_val_;        // extract a number
        return NUM;                 // return the NUM token
    }

    return c;                       // otherwise return the extracted char.
}

6.1.4: The Controlling Function `main()'

In keeping with the spirit of this example, the calcuator's main function is a very small one. It constructs a parser object and then calls its parsing function to start the calculator:
#include "main.ih"

int main()
{
    Parser parser;

    parser.parse();
}

6.1.5: The error reporting member `error()'

When parse encounters a syntax error, it calls the error reporting member function (error) to print an error message. A very basic in-line implementation is provided by bisonc++ in the parser class internal header file (see chapter 5), which can easily be redefined by the programmer. The error member's default implementation is OK for rpn.

Once error returns, bisonc++'s parser may recover from the error and continue parsing if the grammar contains a suitable error rule (see chapter 8). Otherwise, the parsing function parse returns a non-zero value. No error production rules were used for rpn, so the calculator terminates at the first syntax error.. Not very nice in a real-life calculator, but it is acceptable for this first example.

6.1.6: Running Bisonc++ to generate the Parser

Before running bisonc++, we should decide how to organize the program's source code, using one or more source files. Even though the example is simple, it is good practice to define all user-defined functions in source files of their own. For rpn this means that a source file (main.cc) is constructed holding main, and a file parser/lex.cc containing the lexical scanner's implementation. Note that I've put all the parser's files in a separate directory as well (also see section 3.7).

In rpn's parser directory the file grammar contains the grammar specification. Bisonc++ constructs a parser class and a parsing member function from this file after issuing the command:


    b() grammar
        
From this, bisonc++ produced the following files: By default, Parserbase.h and parse.cc will be re-created each time bisonc++ is re-run. Parser.h and Parser.ih may safely be modified by the programmer, e.g., to add new members to to the parser class. These two files will not be overwritten by bisonc++, unless explicitly instructed to do so.

6.1.7: Constructing and running `rpn'

Here is how to compile and run the parser file:

    # List files (recursively) in the (current) examples/rpn  directory.
    % ls -R 
    .:
    build*  main.cc  parser/  rpn.ih
    
    ./parser:
    grammar  lex.cc  Parser.ih
    
    # Create `rpn' using the `build' script:
    % ./build program
    
    # List files again, ./rpn is the constructed program
    % ls -R
    .:
    build*  main.cc  parser/  program*  rpn.ih
    
    ./parser:
    grammar  grammar.output  lex.cc  parse.cc  Parserbase.h  Parser.h  Parser.ih
        

Here is an example session using the calculator:


    % program
    4 9 +
            13
    3 7 + 3 4 5 *+-
            -13
    3 7 + 3 4 5 * + - n              Note the unary minus, `n'
            13
    5 6 / 4 n +
            -3.16667
    3 4 ^                            Exponentiation
            81
    ^D                               End-of-file indicator
    %
        

6.2: `calc': an Infix Notation Calculator

We now modify rpn to handle infix operators instead of postfix. Infix notation requires us to define operator precedence and to extend the grammar with nested expressions. Here is bisonc++'s grammar specification for calc, an infix calculator:
%baseclass-preinclude   cmath
%stype double

%token NUM
%left '-' '+'
%left '*' '/'
%left NEG     // negation--unary minus 
%right '^'    // exponentiation        

%%

input:    
        // empty 
| 
        input line
;

line:   
        '\n'
| 
        exp '\n'  
        { 
            std::cout << "\t" << $1 << '\n';
        }
;

exp:      
        NUM             
| 
        exp '+' exp 
        { 
            $$ = $1 + $3;
        }
| 
        exp '-' exp 
        { 
            $$ = $1 - $3;
        }
| 
        exp '*' exp 
        { 
            $$ = $1 * $3;
        }
| 
        exp '/' exp 
        { 
            $$ = $1 / $3;
        }
| 
        '-' exp %prec NEG
        { 
            $$ = -$2;
        }
|
        // Exponentiation:
        exp '^' exp 
        { 
            $$ = pow($1, $3);
        }
|
        '(' exp ')'
        { 
            $$ = $2;
        }
;

The functions lex, error and main are identical to the ones used with rpn.

This example illustrates several important new features:

Here is a sample run of calc:


    % calc
    4 + 4.5 - (34/(8*3+-3))
            6.88095
    -56 + 2
            -54
    3 ^ 2
            9
        

6.3: Basic Error Recovery

Up to this point, error-recovery hasn't yet been covered i.e., how to continue parsing after the parser detects a syntax error. By default parse returns after calling error. This means that an erroneous input line ends the calculator. Now we show how to recover from erroneous input.

The bisonc++ grammar-specification language includes the reserved symbol error, which can be used in production rules. In the example below it has been added as a new alternative for recognizing the line nonterminal:

line:   
        '\n'
| 
        exp '\n'  
        { 
            std::cout << "\t" << $1 << '\n';
        }
| 
        error '\n'
;

This addition to the grammar allows the calculator to recover from syntax errors. If a syntax error is encountered, the third production rule is activated, skipping all tokens until a newline token has been encounters. At that point line has been recognized, and parsing continues afresh at the next input line (the error member function is also called, printing its message). Different from bison's error implementation, bisonc++ proceeds on the assumption that whenever error is used in a production rule it is the grammar constructor's intention to have the parser continue parsing (therefore, a statement like `yyerrok', encountered in bison grammars is superfluous in bisonc++ grammars). The reserved symbol error itself causes the parsing function to skip all subsequent input until a token that can follow error has been encountered. In the above implementation that token is the newline character `\n' (see chapter 8).

This form of error recovery deals with syntax errors. There are other kinds of errors; for example, divisions by zero, which raises an exception signal that is normally fatal. A real calculator program must handle this signal and use whatever it takes to discard the rest of the current line of input and resume parsing thereafter. Handling such signals and other forms of semantic errors is not discussed here, as it is not specific to bisonc++ programs. But once a semantic error has been encountered, handling functions may call ERROR(), resulting in the same procedure as the one that's used for syntax errors: all subsequenct tokens are skipped until a token has been encountered that can follow the reserved symbol error.

6.4: `mfcalc': a Multi-Function Calculator

Now that the basics of using bisonc++ have been discussed, it is time to move on to a more advanced problem. The above calculators provided only offer the operators +, - * / and ^. It would be nice to have a calculator that allows us to use some other mathematical functions as well, such as sin, cos, etc..

It is easy to add new operators to the infix calculator as long as they are only single-character literals. The parser's member lex returns all non-number characters as tokens, so only some new grammar production rules need to be added to the grammar when such tokens must be recognized. But we want something more flexible: built-in functions that can be called like this:


    function_name (argument)
        
At the same time, we add memory to the calculator, allowing us to use variables. Here is a sample session with the multi-function calculator:

	pi = 3.141592653589
	        3.14159
	sin(pi)
	        7.93266e-13
	alpha = beta1 = 2.3
	        2.3
	alpha
	        2.3
	ln(alpha)
	        0.832909
	exp(ln(beta1))
	        2.3
        
Note that multiple assignment and nested function calls are supported.

6.4.1: The Declaration Section for `mfcalc'

The grammar specification file for the mfcalc calculator shows several new features. Here is the bisonc++ directive section for the mfcalc multi-function calculator (line numbers were added for referential purposes, they are not part of the declaraction section as used in the actual grammar file):

    1   %union
    2   {
    3       double u_val;
    4       double *u_symbol;
    5       double (*u_fun)(double);
    6   }
    7
    8   %token <u_val>  NUM         // Simple double precision number
    9   %token <u_symbol> VAR       // Variable
   10   %token <u_fun>  FNCT        // Function
   11   %type  <u_val>  exp
   12
   13   %right '='
   14   %left '-' '+'
   15   %left '*' '/'
   16   %left NEG                   // negation--unary minus 
   17   %right '^'                  // exponentiation        
        
The above specifications show two new features of bisonc++'s grammar specification language, allowing semantic values to have different data types. Finally note the right associative operator `=', defined in line 13: by making the assignment operator right-associative we can allow sequential assignments of the form a = b = c = expression.

6.4.2: Grammar Rules for `mfcalc'

Here are the grammar rules for the multi-function calculator. Most of them are copied directly from calc. Three rules, mentioning VAR or FNCT, are new:
input:    
        // empty 
| 
        input line
;

line:   
        '\n'
| 
        exp '\n'  
        { 
            cout << "\t" << $1 << endl;
        }
| 
        error '\n'
;

exp:      
        NUM             
| 
        VAR                
        { 
            $$ = *$1;
        }
| 
        VAR '=' exp        
        { 
            $$ = *$1 = $3;
        }
| 
        FNCT '(' exp ')'   
        { 
            $$ = (*$1)($3); 
        }
| 
        exp '+' exp 
        { 
            $$ = $1 + $3;
        }
| 
        exp '-' exp 
        { 
            $$ = $1 - $3;
        }
| 
        exp '*' exp 
        { 
            $$ = $1 * $3;
        }
| 
        exp '/' exp 
        { 
            $$ = $1 / $3;
        }
| 
        '-' exp %prec NEG
        { 
            $$ = -$2;
        }
|
        // Exponentiation:
        exp '^' exp 
        { 
            $$ = pow($1, $3);
        }
|
        '(' exp ')'
        { 
            $$ = $2;
        }
;

6.4.3: The `mfcalc' Symbol- and Function Tables

The multi-function calculator needs a symbol table for keeping track of the names and meanings of variables and functions. This doesn't affect the grammar rules or the calculator's directives, but it requires that the parser class defines some additional C++ types as well as several additional data members.

The symbol table itself varies in size and contents once mfcalc is used. It is defined as the data member d_symbols in the Parser's header file. In contrast, the function table has a fixed size and contents. Because of this the function table is defined as a static data member. Both tables are defined as std::unordered_map containers: their keys are std::string objects, their values, respecively, doubles and double (*)(double)s. Here is the declaration of d_symbols and s_functions as used in mfcalc's parser:


    std::unordered_map<std::string, double> d_symbols;

    static std::unordered_map<std::string, double (*)(double)> s_functions;
        
As s_functions is a static member, it can be initialized compile time from an array of pairs. To ease the definition of such an array a private typedef

    typedef std::pair<char const *, double (*)(double)> FunctionPair;
        
is added to the parser class, as well as a private array

    static FunctionPair s_funTab[];
        
These definitions allow us to initialize s_functions in a separate source file (data.cc):
#include "Parser.ih"

Parser::FunctionPair Parser::s_funTab[] =
{
    FunctionPair("sin",  sin),
    FunctionPair("cos",  cos),
    FunctionPair("atan", atan),
    FunctionPair("ln",   log),
    FunctionPair("exp",  exp),
    FunctionPair("sqrt", sqrt),
};

unordered_map<string, double (*)(double)> Parser::s_functions
(
    Parser::s_funTab, 
    Parser::s_funTab + sizeof(Parser::s_funTab) / sizeof(Parser::FunctionPair)
);



By simply editing the definition of s_funTab, additional functions can be added to the calculator.

6.4.4: The new `lex()' member

In mfcalc, the parser's member function lex() must now recognize variables, function names, numeric values, and the single-character arithmetic operators. Strings of alphanumeric characters not starting with a digit are recognized as either variables or functions depending on the table in which they are found. By arranging lex's logic such that the function table is searched first it is simple to ensure that no variable can ever have the name of a predefined function. The implementation used here, in which two different tables are used for the arithmetic functions and the variable symbols is appealing because it's simple to implement. However, it also has the drawback of being difficult to scale to more generic calculators, using, e.g., different data types and different types of functions. In such situations a single symbol table is more preferable, where the keys are the identifiers (variables, function names, predefined constants, etc.) while the values are objects describing their characteristics. A re-implementation of mfcalc using an integrated symbol table is suggested in one of the exercises of the next section 6.5.

The parser's lex member has these characteristics:

Here is the parser's lex member function:
#include "Parser.ih"

/*
    Lexical scanner returns a double floating point 
    number on the stack and the token NUM, or the ASCII
    character read if not a number.  Skips all blanks
    and tabs, returns 0 for EOF.
*/

int Parser::lex()
{
    char c;
                                    // get the next non-ws character
    while (cin.get(c) && (c == ' ' || c == '\t'))
        ;

    if (!cin)                   // no characters were obtained
        return 0;                   // indicate End Of Input  
    
    if (c == '.' || isdigit (c))    // if a digit char was found
    {
        cin.putback(c);        // return the character
        cin >> d_val_.u_val;    // extract a number
        return NUM;                 // return the NUM token
    }
    
    if (!isalpha(c))                // c doesn't start an identifier: 
        return c;                   // return a single character token.

    // in all other cases, an ident is entered. Recognize a var or function

    string word;           // store the name in a string object

    while (true)                // process all alphanumerics:
    {
        word += c;              // add 'm to `word'
        if (!cin.get(c))   // no more chars? then done here
            break;

        if (!isalnum(c))        // not an alphanumeric: put it back and done.
        {
            cin.putback(c);
            break;
        }
    }
                                // Now lookup the name as a function's name
    unordered_map<string, double (*)(double)>::iterator function = 
                                                s_functions.find(word);

                                // Got it, so return FPTR 
    if (function != s_functions.end())  
    {
        d_val_.u_fun = function->second;
        return FNCT;
    }
                                // no function, so return a VAR. Set
                                // u_symbol to the symbol's address in the
                                // d_symbol map. The map will add the
                                // symbol if not yet there.
    d_val_.u_symbol = &d_symbols[word];
    return VAR;            
}

6.4.5: Constructing `mfcalc'

In order to construct mfcalc, the following steps are suggested:

6.5: Exercises

Here are some suggestions for you to consider to improve mfcalc's implementation and operating mode: