Next: Memory Management, Previous: Reentrant C Scanners, Up: flex [Contents][Index]
flex
is a rewrite of the AT&T Unix lex tool (the two
implementations do not share any code, though), with some extensions and
incompatibilities, both of which are of concern to those who wish to
write scanners acceptable to both implementations. flex
is fully
compliant with the POSIX lex
specification, except that when
using %pointer
(the default), a call to unput()
destroys
the contents of yytext
, which is counter to the POSIX
specification. In this section we discuss all of the known areas of
incompatibility between flex
, AT&T lex
, and the POSIX
specification. flex
’s ‘-l’ option turns on maximum
compatibility with the original AT&T lex
implementation, at the
cost of a major loss in the generated scanner’s performance. We note
below which incompatibilities can be overcome using the ‘-l’
option. flex
is fully compatible with lex
with the
following exceptions:
lex
scanner internal variable yylineno
is
not supported unless ‘-l’ or %option yylineno
is used.
yylineno
should be maintained on a per-buffer basis, rather than
a per-scanner (single global variable) basis.
yylineno
is not part of the POSIX specification.
input()
routine is not redefinable, though it may be called
to read characters following whatever has been matched by a rule. If
input()
encounters an end-of-file the normal yywrap()
processing is done. A “real” end-of-file is returned by
input()
as EOF
.
YY_INPUT()
macro.
flex
restriction that input()
cannot be redefined is
in accordance with the POSIX specification, which simply does not
specify any way of controlling the scanner’s input other than by making
an initial assignment to yyin.
unput()
routine is not redefinable. This restriction is in
accordance with POSIX.
flex
scanners are not as reentrant as lex
scanners. In
particular, if you have an interactive scanner and an interrupt handler
which long-jumps out of the scanner, and the scanner is subsequently
called again, you may get the following message:
fatal flex scanner internal error--end of buffer missed
To reenter the scanner, first use:
yyrestart( yyin );
Note that this call will throw away any buffered input; usually this
isn’t a problem with an interactive scanner. See Reentrant C Scanners, for
flex
’s reentrant API.
flex
C++ scanner classes
are
reentrant, so if using C++ is an option for you, you should use
them instead. See Generating C++ Scanners, and Reentrant C Scanners for details.
output()
is not supported. Output from the ECHO macro is
done to the file-pointer yyout
(default stdout).
output()
is not part of the POSIX specification.
lex
does not support exclusive start conditions (%x), though they
are in the POSIX specification.
flex
encloses them in parentheses.
With lex
, the following:
NAME [A-Z][A-Z0-9]* %% foo{NAME}? printf( "Found it\n" ); %%
will not match the string ‘foo’ because when the macro is expanded
the rule is equivalent to ‘foo[A-Z][A-Z0-9]*?’ and the precedence
is such that the ‘?’ is associated with ‘[A-Z0-9]*’. With
flex
, the rule will be expanded to ‘foo([A-Z][A-Z0-9]*)?’
and so the string ‘foo’ will match.
<<EOF>>
operators
cannot be used in a flex
definition.
lex
behavior of no parentheses
around the definition.
lex
allow a rule’s action to begin on a
separate line, if the rule’s pattern has trailing whitespace:
%% foo|bar<space here> { foobar_action();}
flex
does not support this feature.
lex
%r
(generate a Ratfor scanner) option is not
supported. It is not part of the POSIX specification.
unput()
, yytext is undefined until the
next token is matched, unless the scanner was built using %array
.
This is not the case with lex
or the POSIX specification. The
‘-l’ option does away with this incompatibility.
lex
interpret ‘abc{1,3}’ as match one, two,
or three occurrences of ‘abc’”, whereas flex
interprets it
as “match ‘ab’ followed by one, two, or three occurrences of
‘c’”. The ‘-l’ and ‘--posix’ options do away with this
incompatibility.
lex
interprets ‘^foo|bar’ as “match either ’foo’ at the beginning of a
line, or ’bar’ anywhere”, whereas flex
interprets it as “match
either ‘foo’ or ‘bar’ if they come at the beginning of a
line”. The latter is in agreement with the POSIX specification.
%a
supported by
lex
are not required by flex
scanners.. flex
ignores them.
FLEX_SCANNER
is #define
’d so scanners may be
written for use with either flex
or lex
. Scanners also
include YY_FLEX_MAJOR_VERSION
, YY_FLEX_MINOR_VERSION
and YY_FLEX_SUBMINOR_VERSION
indicating which version of flex
generated the scanner. For
example, for the 2.5.22 release, these defines would be 2, 5 and 22
respectively. If the version of flex
being used is a beta
version, then the symbol FLEX_BETA
is defined.
The following flex
features are not included in lex
or the
POSIX specification:
flex
command-line options
The feature “multiple actions on a line”
refers to the fact that with flex
you can put multiple actions on
the same line, separated with semi-colons, while with lex
, the
following:
foo handle_foo(); ++num_foos_seen;
is (rather surprisingly) truncated to
foo handle_foo();
flex
does not truncate the action. Actions that are not enclosed
in braces are simply terminated at the end of the line.
Next: Memory Management, Previous: Reentrant C Scanners, Up: flex [Contents][Index]