Next: Memory Management, Previous: Reentrant C Scanners, Up: flex [Contents][Index]
flex is a rewrite of the AT&T Unix lex tool (the two
implementations do not share any code, though), with some extensions and
incompatibilities, both of which are of concern to those who wish to
write scanners acceptable to both implementations. flex is fully
compliant with the POSIX lex specification, except that when
using %pointer (the default), a call to unput() destroys
the contents of yytext, which is counter to the POSIX
specification. In this section we discuss all of the known areas of
incompatibility between flex, AT&T lex, and the POSIX
specification. flex’s ‘-l’ option turns on maximum
compatibility with the original AT&T lex implementation, at the
cost of a major loss in the generated scanner’s performance. We note
below which incompatibilities can be overcome using the ‘-l’
option. flex is fully compatible with lex with the
following exceptions:
lex scanner internal variable yylineno is
not supported unless ‘-l’ or %option yylineno is used.
yylineno should be maintained on a per-buffer basis, rather than
a per-scanner (single global variable) basis.
yylineno is not part of the POSIX specification.
input() routine is not redefinable, though it may be called
to read characters following whatever has been matched by a rule. If
input() encounters an end-of-file the normal yywrap()
processing is done. A “real” end-of-file is returned by
input() as EOF.
YY_INPUT() macro.
flex restriction that input() cannot be redefined is
in accordance with the POSIX specification, which simply does not
specify any way of controlling the scanner’s input other than by making
an initial assignment to yyin.
unput() routine is not redefinable. This restriction is in
accordance with POSIX.
flex scanners are not as reentrant as lex scanners. In
particular, if you have an interactive scanner and an interrupt handler
which long-jumps out of the scanner, and the scanner is subsequently
called again, you may get the following message:
fatal flex scanner internal error--end of buffer missed
To reenter the scanner, first use:
yyrestart( yyin );
Note that this call will throw away any buffered input; usually this
isn’t a problem with an interactive scanner. See Reentrant C Scanners, for
flex’s reentrant API.
flex C++ scanner classes
are
reentrant, so if using C++ is an option for you, you should use
them instead. See Generating C++ Scanners, and Reentrant C Scanners for details.
output() is not supported. Output from the ECHO macro is
done to the file-pointer yyout (default stdout).
output() is not part of the POSIX specification.
lex does not support exclusive start conditions (%x), though they
are in the POSIX specification.
flex encloses them in parentheses.
With lex, the following:
NAME [A-Z][A-Z0-9]*
%%
foo{NAME}? printf( "Found it\n" );
%%
will not match the string ‘foo’ because when the macro is expanded
the rule is equivalent to ‘foo[A-Z][A-Z0-9]*?’ and the precedence
is such that the ‘?’ is associated with ‘[A-Z0-9]*’. With
flex, the rule will be expanded to ‘foo([A-Z][A-Z0-9]*)?’
and so the string ‘foo’ will match.
<<EOF>> operators
cannot be used in a flex definition.
lex behavior of no parentheses
around the definition.
lex allow a rule’s action to begin on a
separate line, if the rule’s pattern has trailing whitespace:
%%
foo|bar<space here>
{ foobar_action();}
flex does not support this feature.
lex %r (generate a Ratfor scanner) option is not
supported. It is not part of the POSIX specification.
unput(), yytext is undefined until the
next token is matched, unless the scanner was built using %array.
This is not the case with lex or the POSIX specification. The
‘-l’ option does away with this incompatibility.
lex
interpret ‘abc{1,3}’ as match one, two,
or three occurrences of ‘abc’”, whereas flex interprets it
as “match ‘ab’ followed by one, two, or three occurrences of
‘c’”. The ‘-l’ and ‘--posix’ options do away with this
incompatibility.
lex
interprets ‘^foo|bar’ as “match either ’foo’ at the beginning of a
line, or ’bar’ anywhere”, whereas flex interprets it as “match
either ‘foo’ or ‘bar’ if they come at the beginning of a
line”. The latter is in agreement with the POSIX specification.
%a supported by
lex are not required by flex scanners.. flex
ignores them.
FLEX_SCANNER is #define’d so scanners may be
written for use with either flex or lex. Scanners also
include YY_FLEX_MAJOR_VERSION, YY_FLEX_MINOR_VERSION
and YY_FLEX_SUBMINOR_VERSION
indicating which version of flex generated the scanner. For
example, for the 2.5.22 release, these defines would be 2, 5 and 22
respectively. If the version of flex being used is a beta
version, then the symbol FLEX_BETA is defined.
The following flex features are not included in lex or the
POSIX specification:
flex command-line options
The feature “multiple actions on a line”
refers to the fact that with flex you can put multiple actions on
the same line, separated with semi-colons, while with lex, the
following:
foo handle_foo(); ++num_foos_seen;
is (rather surprisingly) truncated to
foo handle_foo();
flex does not truncate the action. Actions that are not enclosed
in braces are simply terminated at the end of the line.
Next: Memory Management, Previous: Reentrant C Scanners, Up: flex [Contents][Index]