SQL parser

Implementation overview
Generic SQL parser
Provider specific SQL parser
Tips to write a custom parser
GdaSqlStatement — SQL statement

Libgda implements a generic SQL parser which creates GdaStatement objects from an SQL string. If the database provider needs to implement its own parser because the generic one does not handle the database specific SQL syntax, it can be done using instructions in this chapter. Otherwise, the provider's sources can be cleared of any code related to the parser.

Implementation overview

This section describes how the generic SQL parser and a provider specific parser are built regarding the files and programs which are involved.

Generic SQL parser

The GdaSqlParser object can parse any SQL string of any SQL dialect, while always identifying the variables (which have a Libgda's specific syntax) in the string. If the parser can identify a structure in the SQL string it understands, then it internally builds a GdaSqlStatement structure of the correct type, and if it cannot then is simply delimits parts in the SQL string to identify variables and also builds a GdaSqlStatement structure but of GDA_SQL_STATEMENT_UNKNOWN. If the string cannot be delimited and variables identified, then it returns an error (usually there is a quotes mismatch problem within the SQL string).

Failing to identify a known structure in the SQL string can have several reasons:

  • the SQL string is not one of the known types of statements (see GdaSqlStatementType)

  • the SQL uses some database specific extensions

The generic SQL parser implementation has its source files in the libgda/sql-parser directory; the files which actually implement the parser itself are the parser.y, delimiter.y and parser_tokens.h files:

  • The parser.y file contains the grammar used by the parser

  • The delimiter.y file contains the grammar used by the parser when it is operating as a delimiter

  • The parser_tokens.h defines some hard coded tokens

The parser grammar files use the Lemon parser generator syntax which is a LALR parser similar to YACC or bison. The lexer part however is not LEX but is a custom one integrated in the gda-sql-parser.c file (this allows a better integration between the lexer and parser parts).

The following figure illustrates the files involved and how they are produced and used to create the generic SQL parser.

Generic SQL parser's implementation

  • The white background indicate files which are sources (part of Libgda's distribution)

  • The blue background indicate files that they are produced dynamically

  • The pink background indicate programs that are compiled and used themselves in the compilation process to generate files. These programs are:

    • lemon: the lemon parser itself

    • gen_def: generated the "converters" arrays (see blow)

    Note that none of these programs gets installed (and when cross compiling, they are compiled as programs executing on the host actually making the compilation).

  • The green background identifies components which are reused when implementing provider specific parsers

The tokenizer (AKA lexer) generates values identified by the "L_" prefix (for example "L_BEGIN"). Because the GdaSqlParser object uses the same lexer with at least two different parsers (namely the parser and delimiter mentioned earlier), and because the Lemon parser generator generates its own value identifiers for tokens, there is a conversion step (the "converter" block in the diagram) which converts the "L_" prefixed tokens with the ones usable by each parser (both converters are defined as arrays in the token_types.h file.

Provider specific SQL parser

One wants to write a database specific SQL parser when:

  • the SQL understood by the database differs from the generic SQL. For example PostgreSQL associates priorities to the compound statement in a different way as the generic SQL. In this case it is strongly recommended to write a custom SQL parser

  • the SQL understood by the database has specific extensions

Using the same background color conventions as the previous diagram, the following diagram illustrates the files involved and how they are produced and used to create a provider specific SQL parser:

Provider specific SQL parser's implementation

The key differences are:

  • The delimiter part of the GdaSqlParser object is the same as for the generic SQL parser implementation

  • While the lemon program is the same as for the generic SQL parser, the gen_def is different, and takes as its input the ".h" file produced by the lemon program and the libgda/sql-parser/token_types.h.