Libgda implements a generic SQL parser which creates GdaStatement objects from an SQL string. If the database provider needs to implement its own parser because the generic one does not handle the database specific SQL syntax, it can be done using instructions in this chapter. Otherwise, the provider's sources can be cleared of any code related to the parser.
This section describes how the generic SQL parser and a provider specific parser are built regarding the files and programs which are involved.
The GdaSqlParser object can parse any SQL string of any SQL dialect, while always identifying the variables (which have a Libgda's specific syntax) in the string. If the parser can identify a structure in the SQL string it understands, then it internally builds a GdaSqlStatement structure of the correct type, and if it cannot then is simply delimits parts in the SQL string to identify variables and also builds a GdaSqlStatement structure but of GDA_SQL_STATEMENT_UNKNOWN. If the string cannot be delimited and variables identified, then it returns an error (usually there is a quotes mismatch problem within the SQL string).
Failing to identify a known structure in the SQL string can have several reasons:
the SQL string is not one of the known types of statements (see GdaSqlStatementType)
the SQL uses some database specific extensions
The generic SQL parser implementation has its source files in the
libgda/sql-parser
directory; the files which actually implement
the parser itself are the parser.y
, delimiter.y
and
parser_tokens.h
files:
The parser.y
file contains the grammar used by the parser
The delimiter.y
file contains the grammar used by the parser when it
is operating as a delimiter
The parser_tokens.h
defines some hard coded tokens
The parser grammar files use the Lemon parser generator syntax
which is a LALR parser similar to YACC or bison. The lexer part
however is not LEX but is a custom one integrated in the
gda-sql-parser.c
file (this allows a better integration between the lexer and parser parts).
The following figure illustrates the files involved and how they are produced and used to create the generic SQL parser.
The white background indicate files which are sources (part of Libgda's distribution)
The blue background indicate files that they are produced dynamically
The pink background indicate programs that are compiled and used themselves in the compilation process to generate files. These programs are:
lemon: the lemon parser itself
gen_def: generated the "converters" arrays (see blow)
Note that none of these programs gets installed (and when cross compiling, they are compiled as programs executing on the host actually making the compilation).
The green background identifies components which are reused when implementing provider specific parsers
The tokenizer (AKA lexer) generates values identified by the "L_" prefix (for example "L_BEGIN"). Because
the GdaSqlParser object uses the same lexer with at least two different parsers (namely the parser and delimiter
mentioned earlier), and because the Lemon parser generator generates its own value identifiers for tokens, there
is a conversion step (the "converter" block in the diagram) which converts the "L_" prefixed tokens with the ones
usable by each parser (both converters are defined as arrays in the token_types.h
file.
One wants to write a database specific SQL parser when:
the SQL understood by the database differs from the generic SQL. For example PostgreSQL associates priorities to the compound statement in a different way as the generic SQL. In this case it is strongly recommended to write a custom SQL parser
the SQL understood by the database has specific extensions
Using the same background color conventions as the previous diagram, the following diagram illustrates the files involved and how they are produced and used to create a provider specific SQL parser:
The key differences are:
The delimiter part of the GdaSqlParser object is the same as for the generic SQL parser implementation
While the lemon program is the same as for the generic SQL parser,
the gen_def is different, and takes as its input the ".h" file produced by
the lemon program and the libgda/sql-parser/token_types.h
.