The C++ class generated by bisonc++ does not require virtual members (when polymorphic semantic values are used (cf. section 4.6.1 a polymorphic base class is used, but that class has no further implications for the parser class generated by bisonc++); in this case polymorphism is merely used internally, inaccessible to bisonc++'s user, to define a common interface for the various polymorphic data types).
The generated parser class's essential member (the member parse
) is
generated from the grammar specification and so the software engineer will
therefore hardly ever feel the need to modify that function. All but a few of
the remaining predefined members have very clear definitions and meanings as
well, making it unlikely that they should ever require overriding.
It is likely that members like lex
and/or error
need dedicated
definitions with different parsers generated by bisonc++; but then again: while
defining the grammar the definition of the associated support members is a
natural extension of defining the grammar, and can be realized in parallel
with defining the grammar, in practice not requiring any virtual members. By
not requiring virtual members the parser's class organization is simplified,
and calling non-virtual members will be just a trifle faster than calling
these member functions as virtual functions.
In this chapter all available members and features of the generated parser class are discussed. Having read this chapter you should be able to use the generated parser class in your program (using its public members) and to use its facilities in the actions defined for the various production rules and/or use these facilities in additional class members that you might have defined yourself.
In the remainder of this chapter the class's public members are first
discussed, to be followed by the class's private members. While constructing
the grammar all private members are available in the action parts of the
grammar's production rules. Furthermore, any member (and so not just from the
action blocks) may generate errors (thus initiating error recovery procedures)
and may flag the (un)successful parsing of the information given to the parser
(terminating the parsing function parse
).
Symbols defined in the generated parser and parser base class usually end in
an underscore character. Such symbols should not be masked or
redefined. Some members have names not ending in an underscore
character. Those names have either historically been used (like parse
and
ERROR
) or they can be redefined by the user (like int lex()
and
void print()
). Their specific requirements are documented below.
Parser::
) prefixes were omitted):
enum
are used to configure the type of debug
information that will be displayed (assuming that the debug
option/directive was specified when bisonc++ generated the parser's
code). It has three values:
OFF
: no debug information is displayed when the generated
parser's parse
function is called;
ON
: extensive debug information about the parsing process is
displayed when the generated parser's parse
function is called;
ACTIONCASES
: just before executing the grammar's action blocks
the action block number is written to the standard output
stream. These action block numbers refer to case labels of the switch
that is defined in the parser's executeAction
function. It is
commonly used to find the action block where a fatal semantic value
mismatch was observed.
The bit_or
operator can be used to combine ON
and
ACTIONCASES
(see the member function setDebug(DebugMode_
mode)
below).
%lsp-needed, %ltype
or %locationstruct
has been
declared.
return Parser::IDENT
should be used rather than return IDENT
.
debug
directive option was specified when bisonc++
generated the parse
function. If so, it is not active by
default; to activate the debug output call setDebug(true)
, to
suppress the debug output call setDebug(false)
.
setDebug(bool)
it is
always defined but only operational if the debug
directive option
was specified when bisonc++ generated the parse
function. If so, it
is not active by default; to activate, call
setDebug(Parser::ON), setDebug(Parser::ACTIONCASES)
, or
setDebug(Parser::ON | Parser::ACTIONCASES)
. To suppress the
debug code output call setDebug(Parser_::OFF)
or simply
setDebug(false)
.
When the %polymorphic
directive is used:
Meta_
namespace. The Meta_
namespace itself is nested under the
namespace that may have been declared by the %namespace
directive.
enum class Tag_
contains all the
tag-identifiers specified by the %polymorphic
directive. It is
declared outside of the Parser's class, but within the namespace
that may have been declared by the %namespace
directive.
UNEXPECTED_TOKEN_When the parsing process throws
UNEXPECTED_TOKEN_
the recovery
procedure is started (i.e., it is started whenever a syntactic error
is encountered or ERROR
()
is called).
The recovery procedure consists of (1) looking for the first state on the state-stack having an error-production, followed by (2) handling all state transitions that are possible without retrieving a terminal token. Then, in the state requiring a terminal token and starting with the initial unexpected token (3) all subsequent terminal tokens are ignored until a token is retrieved which is a continuation token in that state.
If the error recovery procedure fails (i.e., if no acceptable token is ever encountered) error recovery falls back to the default recovery mode: the parsing process terminates.
PARSE_ACCEPT = 0, PARSE_ABORT = 1(which are also used as the
parse
function's return values).
When the %polymorphic
directive is used:
sizeofTag_
defines the number of tags that were defined for
polymorphic semantic values.
Base::
they are actually protected members
inherited from the parser's base class. These members are shown
below. Following the description of those members several more are listed:
those members are used during the parsing process, andshould not be modified
or masked by user-defined code.
YYABORT
defined by bison++ in that YYABORT
could not be called from
outside of the parsing member function.
YYACCEPT
defined by
bison++ in that YYACCEPT
could not be called from outside of
the parsing member function.
YYERROR
defined by
bison++ in that YYERROR
could not be called from outside of
the parsing member function.
parser.ih
internal header file,
it writes a simple message to the standard error stream. It is called
when a syntactic error is encountered, and its default implementation
may safely be altered.
parser.ih
internal header file. It consists of a mere throw
statement, rethrowing a caught exception.
The parse
member function's body essentially consists of a
while
statement, in which the next token is obtained via the
parser's lex
member. This token is then processed according to the
current state of the parsing process. This may result in executing
actions over which the parsing process has no control and which may
result in exceptions being thrown.
Such exceptions do not necessarily have to terminate the parsing process: they could be thrown by code, linked to the parser, that simply checks for semantic errors (like divisions by zero) throwing exceptions if such errors are observed.
The member exceptionHandler
receives and may handle such
exceptions without necessarily ending the parsing process. It receives
any std::exception
thrown by the parser's actions, as though the
action block itself was surrounded by a try ... catch
statement.
It is of course still possible to use an explicit try ... catch
statement within action blocks. However, exceptionHandler
can
be used to factor out code that is common to various action blocks.
The next example shows an explicit implementation of
exceptionHandler
: any std::exception
thrown by the parser's
action blocks is caught, showing the exception's message, and
increasing the parser's error count. After this parsing continues as
if no exception had been thrown:
void Parser::exceptionHandler(std::exception const &exc) { std::cout << exc.what() << '\n'; ++d_nErrors_; }
parser.ih
internal header file,
it can be pre-implemented by bisonc++ using the scanner
option or
directive (see above); alternatively it must be implemented by the
programmer. It interfaces to the lexical scanner, and should return the
next token produced by the lexical scanner, either as a plain character
or as one of the symbolic tokens defined in the Parser::Tokens_
enumeration. Zero or negative token values are interpreted as `end of
input'.
parser.ih
internal header file,
this member calls print_
to display the last received token and
corresponding matched text. The print_
member is only implemented
if the --print-tokens
option or %print-tokens
directive was
used when the parsing function was generated. Calling print_
from
print
is unconditional, but can easily be controlled by the using
program, by defining, e.g., a command-line option.
true
while recovering from a syntax error.
The following members are required during the parsing process. They should not be modified or masked by user-defined code:
%lsp-needed, %ltype
or
%locationstruct
was specified).
int lex()
private member function is called by the parse()
member to obtain the next lexical token. By default it is not implemented, but
the %scanner
directive (see section 4.5.21) may be used to
pre-implement a standard interface to a lexical analyzer.
The lex()
member function interfaces to the lexical scanner, and it is
expected to return the next token produced by the lexical scanner. This token
may either be a plain character or it may be one of the symbolic tokens
defined in the Parser::Tokens enumeration. Any zero or negative token
value is interpreted as `end of input', causing parse()
to return.
The lex()
member function may be implemented in various ways:
--scanner
option or %scanner
directive is
provided bisonc++ assumes that it should interface to the scanner generated by
flexc++(1). In this case, the scanner token function is called as
d_scanner.lex()and the scanner's matched text function is called as
d_scanner.matched()
lex()
may itself implement a lexical analyzer (a
scanner). This may actually be a useful option when the input offered to
the program using bisonc++'s parser class is not overly complex. This approach was
used when implementing the earlier examples (see sections 6.1.3 and
6.4.4).
lex()
may call a external function or member function of class
implementing a lexical scanner, and return the information offered by this
external function. When using a class, an object of that class could also be
defined as additional data member of the parser (see the next
alternative). This approach can be followed when generating a lexical scanner
from a lexical scanner generating tool like lex(1) or flex++(1). The
latter program allows its users to generate a scanner class.
--flex
option or %flex
directive can be used in combination with the
--scanner
directive or %scanner
option. In this case the scanner token
function is called as
d_scanner.yylex()and the scanner's matched text function is called as
d_scanner.YYText()
parse()
function or since the last detected syntactic error. It is initialized
to d_requiredTokens_
to allow an early error to be detected as
well.
executeAction
member are
displayed to the standard output stream.
d_scanner.setSLoc(&d_loc_);Subsequently, the lexical scanner may assign a value to the parser's d_loc_ variable through the pointer to d_loc_ stored inside the lexical scanner.
parse()
. It is initialized by the
parser's base class initializer, and is updated while parse()
executes. When parse()
has returned it contains the total number
of errors counted by parse()
. Errors are not counted if suppressed
(i.e., if d_acceptedTokens_
is less than d_requiredTokens_
).
parse()
function must have processed before a syntactic error can be
generated.
d_scanner.setSval(&d_val_);Subsequently, the lexical scanner may assign a value to the parser's d_val_ variable through the pointer to d_val_ stored inside the lexical scanner.
Note that in some cases this approach must be used to make available the correct semantic value to the parser. In particular, when a grammar state defines multiple reductions, depending on the next token, the reduction's action only takes place following the retrieval of the next token, thus losing the initially matched token text. As an example, consider the following little grammar:
expr: name | ident '(' ')' | NR ; name: IDENT ; ident: IDENT ;Having recognized
IDENT
two reductions are possible: to name
and to ident
. The reduction to ident
is appropriate when the
next token is (
, otherwise the reduction to name
is
performed. So, the parser asks for the next token, thereby
destroying the text matching IDENT
before ident
or name
's
actions are able to save the text themselves. To enure the
availability of the text matching IDENT
is situations like these
the scanner must assign the proper semantic value when it
recognizes a token. Consequently the parser's d_val_
data member
must be made available to the scanner.
If STYPE_
is a wrapper type for polymorphic semantic values, then
direct assignment of values to d_val_
is is only possible from
values of the defined polymorphic data types. More complex assignments
are also possible, using tagged assignments.
parse
function the following types and
variables are defined in the anonymous namespace. These are mentioned here for
the sake of completeness, and are not normally accessible to other parts of
the parser.
UNDETERMINED_ = -2, EOF_ = -1, errTok_ = 256,These tokens are used by the parser to determine whether another token should be requested from the lexical scanner, and to handle error-conditions.
NORMAL, ERR_ITEM, REQ_TOKEN, ERR_REQ, // ERR_ITEM | REQ_TOKEN DEF_RED, // state having default reduction ERR_DEF, // ERR_ITEM | DEF_RED REQ_DEF, // REQ_TOKEN | DEF_RED ERR_REQ_DEF // ERR_ITEM | REQ_TOKEN | DEF_REDThese tokens are used by the parser to define the types of the various states of the analyzed grammar.
ACCEPT_
,
which is used in the state transition tables to indicate that the
accepting state has been reached.
struct
provides information about production rules. It has two
fields: d_nonTerm
is the identification number of the production's
nonterminal, d_size
represents the number of elements of the
productin rule.
struct
provides the shift/reduce information for the various
grammatic states. SR_
values are collected in arrays, one array
per grammatic state. These array, named s_
<nr>
,
where tt<nr> is a state number are defined in the anonymous namespace
as well. The SR_
elements consist of two unions,
defining fields that are applicable to, respectively, the first,
intermediate and the last array elements.StateType
and (2nd field) the index of the last array element;
intermediate elements consist of (1st field) a symbol value and (2nd
field) (if negative) the production rule number reducing to the
indicated symbol value or (if positive) the next state when the symbol
given in the 1st field is the current token;
the last element of each array consists of (1st field) a placeholder for
the current token and (2nd field) the (negative) rule number to reduce
to by default or the (positive) number of an error-state to go to when
an erroneous token has been retrieved. If the 2nd field is zero, no
error or default action has been defined for the state, and
error-recovery is attepted.
<nr>
is a numerical value representing a state number.
Used internally by the parsing function.
$$
and $i
notations represent semantic values of the nonterminal
defined by production rules and semantic values of components of production
rules. Different `dollar-notations' are available for different types (single,
union, or polymorphic) of semantic values. Refer to section 4.6.2 for a
complete description.
@@
and @n
: Usually these represent plain old data (a C-type
structure) containing information about line numbers and column numbers that
is associated with, respectively, the rule's nonterminal and the production
rule's nth component. The default structure is defined like this (see
also section 4.5.11):
struct LTYPE_ { int timestamp; int first_line; int first_column; int last_line; int last_column; char *text; };Thus, to get the starting line number of the third component, you would use
@3.first_line
.
In order for the members of this structure to contain valid information, you must make sure the lexical scanner supplies this information about each token. If you need only certain fields, then the lexical scanner only has to provide those fields.
Be advised that using this or corresponding (custom-defined, see sections 4.5.12 and 4.5.10) constructions may somewhat slow down the parsing process.