[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4. Parser C-Language Interface

The Bison parser is actually a C function named yyparse. Here we describe the interface conventions of yyparse and the other functions that it needs to use.

Keep in mind that the parser uses many C identifiers starting with `yy' and `YY' for internal purposes. If you use such an identifier (aside from those in this manual) in an action or in additional C code in the grammar file, you are likely to run into trouble.

4.1 The Parser Function yyparse  How to call yyparse and what it returns.
4.2 The Lexical Analyzer Function yylex  You must supply a function yylex which reads tokens.
4.3 The Error Reporting Function yyerror  You must supply a function yyerror.
4.4 Special Features for Use in Actions  Special features for use in actions.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.1 The Parser Function yyparse

You call the function yyparse to cause parsing to occur. This function reads tokens, executes actions, and ultimately returns when it encounters end-of-input or an unrecoverable syntax error. You can also write an action which directs yyparse to return immediately without reading further.

The value returned by yyparse is 0 if parsing was successful (return is due to end-of-input).

The value is 1 if parsing failed (return is due to a syntax error).

In an action, you can cause immediate return from yyparse by using these macros:

YYACCEPT
Return immediately with value 0 (to report success).

YYABORT
Return immediately with value 1 (to report failure).


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.2 The Lexical Analyzer Function yylex

The lexical analyzer function, yylex, recognizes tokens from the input stream and returns them to the parser. Bison does not create this function automatically; you must write it so that yyparse can call it. The function is sometimes referred to as a lexical scanner.

In simple programs, yylex is often defined at the end of the Bison grammar file. If yylex is defined in a separate source file, you need to arrange for the token-type macro definitions to be available there. To do this, use the `-d' option when you run Bison, so that it will write these macro definitions into a separate header file `name.tab.h' which you can include in the other source files that need it. See section Invoking Bison.

4.2.1 Calling Convention for yylex  How yyparse calls yylex.
4.2.2 Semantic Values of Tokens  How yylex must return the semantic value of the token it has read.
4.2.3 Textual Positions of Tokens  How yylex must return the text position
(line number, etc.) of the token, if the
actions want that.
4.2.4 Calling Conventions for Pure Parsers  How the calling convention differs in a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}).


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.2.1 Calling Convention for yylex

The value that yylex returns must be the numeric code for the type of token it has just found, or 0 for end-of-input.

When a token is referred to in the grammar rules by a name, that name in the parser file becomes a C macro whose definition is the proper numeric code for that token type. So yylex can use the name to indicate that type. See section 3.2 Symbols, Terminal and Nonterminal.

When a token is referred to in the grammar rules by a character literal, the numeric code for that character is also the code for the token type. So yylex can simply return that character code. The null character must not be used this way, because its code is zero and that is what signifies end-of-input.

Here is an example showing these things:

 
yylex ()
{
  ...
  if (c == EOF)     /* Detect end of file. */
    return 0;
  ...
  if (c == '+' || c == '-')
    return c;      /* Assume token type for `+' is '+'. */
  ...
  return INT;      /* Return the type of the token. */
  ...
}

This interface has been designed so that the output from the lex utility can be used without change as the definition of yylex.

If the grammar uses literal string tokens, there are two ways that yylex can determine the token type codes for them:


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.2.2 Semantic Values of Tokens

In an ordinary (nonreentrant) parser, the semantic value of the token must be stored into the global variable yylval. When you are using just one data type for semantic values, yylval has that type. Thus, if the type is int (the default), you might write this in yylex:

 
  ...
  yylval = value;  /* Put value onto Bison stack. */
  return INT;      /* Return the type of the token. */
  ...

When you are using multiple data types, yylval's type is a union made from the %union declaration (see section The Collection of Value Types). So when you store a token's value, you must use the proper member of the union. If the %union declaration looks like this:

 
%union {
  int intval;
  double val;
  symrec *tptr;
}

then the code in yylex might look like this:

 
  ...
  yylval.intval = value; /* Put value onto Bison stack. */
  return INT;          /* Return the type of the token. */
  ...


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.2.3 Textual Positions of Tokens

If you are using the `@n'-feature (see section Special Features for Use in Actions) in actions to keep track of the textual locations of tokens and groupings, then you must provide this information in yylex. The function yyparse expects to find the textual location of a token just parsed in the global variable yylloc. So yylex must store the proper data in that variable. The value of yylloc is a structure and you need only initialize the members that are going to be used by the actions. The four members are called first_line, first_column, last_line and last_column. Note that the use of this feature makes the parser noticeably slower.

The data type of yylloc has the name YYLTYPE.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.2.4 Calling Conventions for Pure Parsers

When you use the Bison declaration %pure_parser to request a pure, reentrant parser, the global communication variables yylval and yylloc cannot be used. (See section A Pure (Reentrant) Parser.) In such parsers the two global variables are replaced by pointers passed as arguments to yylex. You must declare them as shown here, and pass the information back by storing it through those pointers.

 
yylex (lvalp, llocp)
     YYSTYPE *lvalp;
     YYLTYPE *llocp;
{
  ...
  *lvalp = value;  /* Put value onto Bison stack.  */
  return INT;      /* Return the type of the token.  */
  ...
}

If the grammar file does not use the `@' constructs to refer to textual positions, then the type YYLTYPE will not be defined. In this case, omit the second argument; yylex will be called with only one argument.

If you use a reentrant parser, you can optionally pass additional parameter information to it in a reentrant way. To do so, define the macro YYPARSE_PARAM as a variable name. This modifies the yyparse function to accept one argument, of type void *, with that name.

When you call yyparse, pass the address of an object, casting the address to void *. The grammar actions can refer to the contents of the object by casting the pointer value back to its proper type and then dereferencing it. Here's an example. Write this in the parser:

 
%{
struct parser_control
{
  int nastiness;
  int randomness;
};

#define YYPARSE_PARAM parm
%}

Then call the parser like this:

 
struct parser_control
{
  int nastiness;
  int randomness;
};

...

{
  struct parser_control foo;
  ...  /* Store proper data in foo.  */
  value = yyparse ((void *) &foo);
  ...
}

In the grammar actions, use expressions like this to refer to the data:

 
((struct parser_control *) parm)->randomness

If you wish to pass the additional parameter data to yylex, define the macro YYLEX_PARAM just like YYPARSE_PARAM, as shown here:

 
%{
struct parser_control
{
  int nastiness;
  int randomness;
};

#define YYPARSE_PARAM parm
#define YYLEX_PARAM parm
%}

You should then define yylex to accept one additional argument--the value of parm. (This makes either two or three arguments in total, depending on whether an argument of type YYLTYPE is passed.) You can declare the argument as a pointer to the proper object type, or you can declare it as void * and access the contents as shown above.

You can use `%pure_parser' to request a reentrant parser without also using YYPARSE_PARAM. Then you should call yyparse with no arguments, as usual.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.3 The Error Reporting Function yyerror

The Bison parser detects a parse error or syntax error whenever it reads a token which cannot satisfy any syntax rule. A action in the grammar can also explicitly proclaim an error, using the macro YYERROR (see section Special Features for Use in Actions).

The Bison parser expects to report the error by calling an error reporting function named yyerror, which you must supply. It is called by yyparse whenever a syntax error is found, and it receives one argument. For a parse error, the string is normally "parse error".

If you define the macro YYERROR_VERBOSE in the Bison declarations section (see section The Bison Declarations Section), then Bison provides a more verbose and specific error message string instead of just plain "parse error". It doesn't matter what definition you use for YYERROR_VERBOSE, just whether you define it.

The parser can detect one other kind of error: stack overflow. This happens when the input contains constructions that are very deeply nested. It isn't likely you will encounter this, since the Bison parser extends its stack automatically up to a very large limit. But if overflow happens, yyparse calls yyerror in the usual fashion, except that the argument string is "parser stack overflow".

The following definition suffices in simple programs:

 
yyerror (s)
     char *s;
{
  fprintf (stderr, "%s\n", s);
}

After yyerror returns to yyparse, the latter will attempt error recovery if you have written suitable error recovery grammar rules (see section 6. Error Recovery). If recovery is impossible, yyparse will immediately return 1.

The variable yynerrs contains the number of syntax errors encountered so far. Normally this variable is global; but if you request a pure parser (see section A Pure (Reentrant) Parser) then it is a local variable which only the actions can access.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.4 Special Features for Use in Actions

Here is a table of Bison constructs, variables and macros that are useful in actions.

`$$'
Acts like a variable that contains the semantic value for the grouping made by the current rule. See section 3.5.3 Actions.

`$n'
Acts like a variable that contains the semantic value for the nth component of the current rule. See section 3.5.3 Actions.

`$<typealt>$'
Like $$ but specifies alternative typealt in the union specified by the %union declaration. See section Data Types of Values in Actions.

`$<typealt>n'
Like $n but specifies alternative typealt in the union specified by the %union declaration. See section Data Types of Values in Actions.

`YYABORT;'
Return immediately from yyparse, indicating failure. See section The Parser Function yyparse.

`YYACCEPT;'
Return immediately from yyparse, indicating success. See section The Parser Function yyparse.

`YYBACKUP (token, value);'
Unshift a token. This macro is allowed only for rules that reduce a single value, and only when there is no look-ahead token. It installs a look-ahead token with token type token and semantic value value; then it discards the value that was going to be reduced by this rule.

If the macro is used when it is not valid, such as when there is a look-ahead token already, then it reports a syntax error with a message `cannot back up' and performs ordinary error recovery.

In either case, the rest of the action is not executed.

`YYEMPTY'
Value stored in yychar when there is no look-ahead token.

`YYERROR;'
Cause an immediate syntax error. This statement initiates error recovery just as if the parser itself had detected an error; however, it does not call yyerror, and does not print any message. If you want to print an error message, call yyerror explicitly before the `YYERROR;' statement. See section 6. Error Recovery.

`YYRECOVERING'
This macro stands for an expression that has the value 1 when the parser is recovering from a syntax error, and 0 the rest of the time. See section 6. Error Recovery.

`yychar'
Variable containing the current look-ahead token. (In a pure parser, this is actually a local variable within yyparse.) When there is no look-ahead token, the value YYEMPTY is stored in the variable. See section Look-Ahead Tokens.

`yyclearin;'
Discard the current look-ahead token. This is useful primarily in error rules. See section 6. Error Recovery.

`yyerrok;'
Resume generating error messages immediately for subsequent syntax errors. This is useful primarily in error rules. See section 6. Error Recovery.

`@n'
Acts like a structure variable containing information on the line numbers and column numbers of the nth component of the current rule. The structure has four members, like this:

 
struct {
  int first_line, last_line;
  int first_column, last_column;
};

Thus, to get the starting line number of the third component, you would use `@3.first_line'.

In order for the members of this structure to contain valid information, you must make yylex supply this information about each token. If you need only certain members, then yylex need only fill in those members.

The use of this feature makes the parser noticeably slower.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Frank B. Brokken on January, 28 2005 using texi2html