OQL to SQL translator
Reference: The Java OQL to SQL translator News Status Introduction Overview Syntax Type and validity checking SQL Generation OQL FAQ Summary Examples Java class files Limit Clause
News
Release 0.9.6:
|
- | Added support for LIMIT clause for MS SQL Server. |
- | In the case a RDBMS does not support LIMIT/OFFSET clauses, a
SyntaxNotSupportedException is thrown. |
- | Added support for a limit clause and an offset clause. Currently, only
HSQL, mySQL and postgreSQL are supported. |
- | Added an example section. |
Status
The Castor OQL implementation is currently in phase 3 of development.
NOTE: This documentation is not yet finished
Introduction
This document describes an OQL to SQL translator to be added to the
Castor JDO Java object Persistence API. The translator will accept OQL
queries passed as strings, and generate a parse tree of the OQL. It will
then traverse the tree creating the appropriate SQL. The user will then
be able to bind parameters to parameterized queries. Type checking will
be performed on the bound parameters. When the user executes the query,
the system will submit the query to the SQL database, and then
postprocess the SQL resultset to create the appropriate result as a Java
Object or literal. The current
mapping and
persistence
packages will be used for metadata and RDBMS communication.
Four of the (now defunct) SourceXchange milestones for this project call for java
source code. These milestones will be referred to here as phase 1, 2, 3,
and 4. There are many possible OQL features that can be supported, but
weren't discussed in the proposal or RFP. Many of these are probably
unwanted. These additional features are specified as phase 5, which is
out of the scope of this SourceXChange project.
Overview
The parser will construct a parse tree as output from an OQL query
string given as input. The OQL syntax is a subset of the syntax
described in the ODMG 3.0 specification section 4.12, with some
additional constructs. Following is a description of the supported OQL
syntax, and its SQL equivalent.
Certain features of OQL may not be directly translatable to SQL, but
may still be supported, by post processing the query. For example, the
first() and last() collection functions supported in OQL
are not directly translatable to standard SQL, but a resultset can be
post-processed to return the appropriate values. Features requiring
post-processing of SQL resultsets will be documented as such
below.
Currently the OQLQuery checks for correct syntax at the same time as
it does type checking and other types of error checking. The new code
will involve a multiple pass strategy, with the following passes:
-
Parse the String query checking for syntax errors, and return
a parse tree.
-
Traverse the parse tree checking for correct types, valid member
and method identifiers, and use of features which are unsupported.
This pass may also generate some data necessary for creating the SQL.
-
Traverse the tree one final time, creating the equivalent SQL
statement to the OQL Query originally passed.
Syntax
This section describes the first pass which will be done by the
parser. The parser will create a StringTokenizer like this:
StringTokenizer tokenizer
= new StringTokenizer(oql,
"\n\r\t,.()[]+-*/<>=:|$", true);
This will create a StringTokenizer with the delimiter
characters listed in the second argument, and it will return delimeters
as well as tokens. The parser will also create a Vector to be used as a
token buffer. As tokens are returned from the StringTokenizer they will be
added to the Vector. Older tokens will be removed from the Vector when
it reaches a certain size. The Vector will also be modified when the
StringTokenizer returns multi character operators as seperate tokens, for
example the -> method invocation operator.
The parser will consume tokens from the StringTokenizer, generating a
ParseTree. Each ParseTree node will have a nodeType corresponding to its
symbol in the OQL syntax. After each node is created it will look at the
next token and act acordingly, either modifying its properties (i.e.
for DISTINCT property of selectExpr), creating a new child node or
returning an error. If the error travels up to the root node of the
ParseTree, there is a syntax error in the OQL submitted. At the end of this
pass, the ParseTree will contain an apropriate representation of the query,
which will be analyzed, and used to create SQL. Below is the modified
EBNF which will be the Castor OQL syntax.
query ::= selectExpr
| expr
selectExpr ::= select [distinct]
projectionAttributes
fromClause
[whereClause]
[groupClause]
[orderClause]
[limitClause [offsetClause]]
projectionAttributes ::= projectionList
| *
projectionList ::= projection {, projection }
projection ::= field
| expr [as identifier]
fromClause ::= from iteratorDef {, iteratorDef}
iteratorDef ::= identifier [ [as ] identifier ]
| identifier in identifier
whereClause ::= where expr
groupClause ::= group by fieldList {havingClause}
havingClause ::= having expr
orderClause ::= order by sortCriteria
limitClause ::= limit queryParam
offsetClause ::= offset queryParam
sortCriteria ::= sortCriterion {, sortCriterion }
sortCriterion ::= expr [ (asc | desc) ]
expr ::= castExpr
castExpr ::= orExpr
| ( type ) castExpr
orExpr ::= andExpr {or andExpr}
andExpr ::= quantifierExpr {and quantifierExpr}
quantifierExpr ::= equalityExpr
| for all inClause : equalityExpr
| exists inClause : equalityExpr
inClause ::= identifier in expr
equalityExpr ::= relationalExpr
{(= | !=)
[compositePredicate] relationalexpr }
| relationalExpr {like relationalExpr}
relationalExpr ::= additiveExpr
{(< | <=
| > | >= )
[ compositePredicate ] additiveExpr }
| additiveExpr between
additiveExpr and additiveExpr
compositePredicate ::= some | any | all
additiveExpr ::= multiplicativeExpr
{+ multiplicativeExpr}
| multiplicativeExpr
{- multiplicativeExpr}
| multiplicativeExpr
{union multiplicativeExpr}
| multiplicativeExpr
{except multiplicativeExpr}
| multiplicativeExpr
{|| multiplicativeExpr}
multiplicativeExpr ::= inExpr {* inExpr}
| inExpr {/ inExpr}
| inExpr {mod inExpr}
| inExpr {intersect inExpr}
inExpr ::= unaryExpr {in unaryExpr}
unaryExpr ::= + unaryExpr
| - unaryExpr
| abs unaryExpr
| not unaryExpr
| postfixExpr
postfixExpr ::= primaryExpr{[ index ]}
| primaryExpr
{(. | ->)identifier[arglist]}
index ::= expr {, expr}
| expr : expr
argList ::= ([ valueList ])
primaryExpr ::= conversionExpr
| collectionExpr
| aggregateExpr
| undefinedExpr
| collectionConstruction
| identifier[ arglist ]
| queryParam
| literal
| ( query )
conversionExpr ::= listtoset( query )
| element( query )
| distinct( query )
| flatten( query )
collectionExpr ::= first( query )
| last( query )
| unique( query )
| exists( query )
aggregateExpr ::= sum( query )
| min( query )
| max( query )
| avg( query )
| count(( query | * ))
undefinedExpr ::= is_undefined( query )
| is_defined( query )
fieldList ::= field {, field}
field ::= identifier: expr
collectionConstruction ::= array([valueList])
| set([valueList])
| bag([valueList])
| list([valueList])
| list(listRange)
valueList ::= expr {, expr}
listRange ::= expr..expr
queryParam ::= $[(type)]longLiteral
type ::= [unsigned] short
| [unsigned] long
| long long
| float
| double
| char
| string
| boolean
| octet
| enum [identifier.]identifier
| date
| time
| interval
| timestamp
| set <type>
| bag <type>
| list <type>
| array <type>
| dictionary <type, type>
| identifier
identifier ::= letter{letter| digit| _}
literal ::= booleanLiteral
| longLiteral
| doubleLiteral
| charLiteral
| stringLiteral
| dateLiteral
| timeLiteral
| timestampLiteral
| nil
| undefined
booleanLiteral ::= true
| false
longLiteral ::= digit{digit}
doubleLiteral ::= digit{digit}.digit{digit}
[(E | e)[+|-]digit{digit}]
charLiteral ::= 'character'
stringLiteral ::= "{character}"
dateLiteral ::= date
'longliteral-longliteral-longliteral'
timeLiteral ::= time
'longliteral:longLiteral:floatLiteral'
timestampLiteral ::= timestamp
'longLiteral-longLiteral-longLiteral
longliteral:longLiteral:floatLiteral'
floatLiteral ::= digit{digit}.digit{digit}
character ::= letter
| digit
| special-character
letter ::= A|B|...|Z|
a|b|...|z
digit ::= 0|1|...|9
special-character ::= ?|_|*|%|\
The following symbols were removed from the standard OQL Syntax for
the following reasons:
|
- |
andthen: Cannot be implemented in
a single SQL query. |
- |
orelse: Same as above. |
- |
import: This is advanced
functionality which may be added later. This phase will use the
castor mapping mechanism to define the namespace. |
- |
Defined Queries: This is another feature
which can be added later. It is unclear where the queries would be
stored, and what their scope would be seeing as how this project
is an OQL to SQL translator, and not an ODBMS. |
- |
iteratorDef was changed so that all
instances of expr were replaced by identifier. This means that the
from clause can only contain extent names (class names), rather
than any expression. This is the most common case and others could
create complicated SQL sub-queries or post-processing
requirements. |
- |
objectConstruction and
structConstruction were removed. What is the scope of the
constructed object or struct, and how is a struct defined in Java? |
The following symbols were added or modified.
|
- |
between added to relationalExpr. |
- |
Optional type specification added to
queryParam. |
The rest of the standard OQL
syntax remains unchanged. Certain syntactically correct queries may not
be supported in Castor. For example, top level expressions which do not
contain a selectExpr anywhere in the query may not be supported. This
will be discussed further in the next section.
Type and validity checking
The first pass over the ParseTree will do type checking, and create some
structures used in the SQL generation pass. It will also check whether
the identifiers used are valid, and whether the query uses unsupported
features. The following table describes each type of node in the ParseTree, and
how it will be processed in the first pass.
expr |
| - |
A query whose top level element is an expr, rather than a
selectExpr will not be supported within the scope of this project.
These queries can either be stated as a selectExpr, like
aggregateExpr's, or they would require post-processing of the SQL
results, like element()first() and last(). |
|
Phase 5 |
projectionAttributes |
| - |
select * will return a Collection of Arrays of Objects. |
|
Phase 5 |
projectionList |
|
- |
Selecting multiple fields will
return a Collection of Arrays of Objects. |
- |
When there are multiple fields
selected, a list of field names and aliases will be kept for
checking validity of expr's in the whereClause, groupClause,
and orderClause. |
|
Phase 5 |
projection |
|
- |
Alias identifier will be stored. |
- |
expr in projection may only be identifier, without an arglist. |
|
Phase 1 |
projection |
|
- |
expr in projection may only be
identifier (with optional argList), aggregateExpr,
undefinedExpr, and postfixExpr (for selecting fields and
accessors). |
- |
The subquery in aggregateExpr and
undefinedExpr can be identifier (with optional arglist), or
postfixExpr for applying these functions to fields and
accessors. |
- |
If an identifier before the
. or -> contains an arglist, it will be
considered a SQL function, and passed through to the
RDBMS. |
- |
If the postfixExpr contains one of
the above operators, the mapping mechanism will be used to
determine if the path expression is valid and to generate a
list of required join tables, using the manyKey and
manyTable from the JDOFieldDescriptor. |
|
Phase 2 |
fromClause |
| - |
The class of the extent being selected from will be stored,
and ClassDescriptor objects will be instantiated. |
|
Phase 1 |
whereClause |
|
- |
expr in whereClause may only
contain orExpr, andExpr, equalityExpr (without
compositePredicate), relationalExpr, additiveExpr (without
set operators union and except),
multiplicativeExpr (without set operator intersect),
unaryExpr, postFixExpr (must be only primaryExpr, no array
or property reference or method calls). |
- |
primaryExpr may only contain
identifier (without an argList), literal and queryParam.
Identifier will be checked against object name and alias in
projectionList. |
- |
For equalityExpr, relationalExpr,
aditiveExpr, multiplicativeExpr, the left side and right
side expr's must evaluate to comparable types. |
- |
For unaryExpr, simple type
checking for numerical or character based types will be
performed. |
- |
If the operands for any of the
relational, equality, additive, multiplicative, or unary
operators is a query parameter, an expected type will be
determined. If the parameter included a specified type which
is incompatible with the system determined type, an error
will be generated. |
|
Phase 1 |
whereClause |
|
- |
Support for built in OQL functions
will be added to the whereClause: is_defined,
is_undefined. |
- |
inExpr will be supported in whereClause. |
- |
inExpr will only allow collectionConstruction for
the right side argument to in. No subQueries
will be allowed. |
|
Phase 2 |
whereClause |
|
- |
identifiers will be able to
contain an optional arglist. If the arglist is before a
. or -> the identifier will be considered a
SQL function and will be passed through to the DBMS.
Otherwise, the identifier will be for an accessor method, or
a property name. |
- |
Accessor methods and property
references will cause a check through the ClassDescriptor
and FieldDescriptors for the object type, and the required
join tables. |
|
Phase 3 |
whereClause |
|
- |
compositePredicate will be
suported in equalityExpr. |
- |
exists(query) will be supported. |
- |
quantifierExpr will support for
all and exists. |
- |
Subqueries will be supported on the right side of
the in operator |
|
Phase 4 |
groupClause, havingClause |
|
- |
Will identify appropriate fields
in SQL schema for each expr. |
- |
aggregateExpr will be supported. |
- |
Only expr's which translate to SQL
columns whcih are already being selected will be
supported. |
|
Phase 4 |
orderClause |
| - |
May only contain expr's which translate into SQL columns which
are already being selected. |
|
Phase 3 |
SQL Generation
After the first pass, the ParseTree is free of errors, and ready for
the SQL generation step. The existing implementation of the OQLParser
uses the persistence API for SQL generation. This API lacks the
necessary features to generate SQL from any OQL. The SQLEngine class
which implements Persistence is used to create a JDBCQueryExpression.
The SQL is derived from the finder, which is a JDBCQueryExpression
produced by the SQLEngine. The problem is that the SQLEngine only
supports single objects. It cannot generate SQL for path expressions
like this:
select p.address from Person p
This query requires a SQL statement like this:
select address.* from person, address
where person.address_id = address.address_id
The buildFinder method should not be used to generate a
queryExpression. The SQLEngine should be used to get a ClassDescriptor,
and to create a new QueryExpression. The OQLParser should use the
methods in the QueryExpression to generate the SQL. The
JDBCQueryExpression which is an implementation of QueryExpression is
also lacking in necessary features. This class should continue to be
used, but the following features will need to be added:
- addColumn(String)
- For adding something to select without specifying the tablename,
for use with functions (i.e. select count(*))
- addTable(String)
- For when the table has to be added manually.
- addCondition(String)
- Add a condition created outside the class, for nested expressions,
and other expressions that are not of the form table.column op
table.column.
- setDistinct(boolean)
- Used for select distinct.
- addOrderColumn(String tableName, String columnName, boolean desc)
- Used for order by
- addGroupExpr(String)
- Used for group by
- addHavingExpr(String)
- Used for having.
The following table lists each type of
tree node, and how it will be processed in the SQL generation pass.
selectExpr |
| - |
distinct in the selectExpr will result in a call to
setDistinct(true) in the queryExpr. |
|
Phase 2 |
projection |
| - |
The queryExpr will be populated with the columns and tables
necessary to retrieve the object. This will use code similar to
SQLEngine.addLoadSql(...). |
|
Phase 1 |
projection |
|
- |
aggregateExpr and SQL functions
will be passed to addColumn. |
- |
undefinedExpr will be translated
to is null and is not null |
- |
postfixExpr (for selecting fields
and accessors) will result in a different group of select
expressions and "from tables" being
generated. |
|
Phase 2 |
whereClause |
|
- |
Entire expr in where clause will be translated, and then added
to the QueryExpr, using a single call to addCondition(String), and
multiple calls to addTable(String). |
|
Phase 1 |
whereClause |
|
- |
is_defined() will translate
into is not null and is_undefined() will
translate into is null. |
- |
inExpr will translate directly,
with the collectionConstruction
removed. |
|
Phase 2 |
whereClause |
|
- |
compositePredicate and
exists(query) translate directly to SQL. |
- |
For quantifierExpr, exists
will translate into an exists() SQL subquery. for all
will translate into the contrapositive(?) exists query, for
example:
for all x in teachers:
x.name = 'Nis'
translates to:
not exists (select * from teachers
where name != 'Nis')
|
|
Phase 4 |
OQL FAQ
Please see the OQL section of the JDO FAQ.
Summary
The Parser and ParseTree classes will be improved through the phases
of this project. The top level of OQL to SQL translation will look very
simple, like this:
OQLParser parser = new OQLParser(query);
ParseTree pt = parser.getParseTree();
pt.checkPass();
//the SQL generation pass
_expr = pt.getQueryExpr();
These methods will have some additional parameters passed
for storing and retrieving data relevant to the query. Following is a
table containing a list of what will be introduced in each coding phase
of the project.
Phase 1 |
|
- |
New parser structure which
generates symbol tree |
- |
Parser still supports only limited OQL |
- |
selected field aliases |
- |
whereClause supports or, and,
equality, additive, multiplicative, and unary
Operators. |
- |
support for specifying parameter types |
- |
check specified parameter type
against system determined type |
- |
specifying ordered parameters. |
- |
JDBCQueryExpression must support
addCondition(String condition) |
|
Phase 2 |
|
- |
Distinct keyword will be supported
in selectExpr |
- |
aggregateExpr and undefinedExpr
supported in projection (Select statement) |
- |
isDefined, isUndefined in whereClause |
- |
inExpr in whereClause |
- |
type checking/conversion in bind() |
- |
fields and accessors in the
projection. Mapping mechanism may need some additional
features. |
- |
SQL functions in the projection |
- |
order by |
|
Phase 3 |
|
- |
fields and accessors in whereClause |
- |
SQL functions in the where clause |
|
Phase 4 |
|
- |
sub queries |
- |
exists() |
- |
compositePredicate |
- |
quantifierExpr's: for all and exists |
- |
group by |
- |
having |
|
Phase 5: |
|
- |
expr as top level symbol |
- |
queries selecting multiple fields or as
comma separated list or * |
|
Examples
Please find below various examples of OQL queries using the Java class files
as outlined below.
Java class files
The following fragment shows the Java class declaration for the Product class:
package myapp;
public class Product
{
private int _id;
private String _name;
private float _price;
private ProductGroup _group;
public int getId() { ... }
public void setId( int anId ) { ... }
public String getName() { ... }
public void setName( String aName ) { ... }
public float getPrice() { ... }
public void setPrice( float aPrice ) { ... }
public ProductGroup getProductGroup() { ... }
public void setProductGroup( ProductGroup aProductGroup ) { ... }
}
|
The following fragment shows the Java class declaration for the ProductGroup class:
public class ProductGroup
{
private int _id;
private String _name;
public int getId() { ... }
public void setId( int id ) { ... }
public String getName() { ... }
public void setName( String name ) { ... }
}
|
Limit Clause
On a selected number of RDBMS, Castor JDO now supports the use of LIMIT/OFFSET
clauses.
As per this release, the following RDBMS have full/partial support for this
feature.
RDBMS | LIMIT | OFFSET |
postgreSQL | Yes | Yes |
mySQL | Yes | Yes |
Oracle - 1)2) | Yes | Yes |
HSQL | Yes | Yes |
MS SQL | Yes | - |
DB2 | Yes | - |
|
1) Oracle has full support for LIMIT/OFFSET clauses for release 8.1.6 and
later.
2) For the LIMIT/OFFSET clauses to work properly the OQL query is required
to include a ORDER BY clause.
The following code fragment shows an OQL query that uses the LIMIT keyword
to select the first 10 ProductGroup instances.
query = db.getOQLQuery("select p from ProductGroup as p LIMIT $1");
query.bind(10);
|
Below is the same OQL query again, restricting the number of ProductGroup instances
returned to 10, though this time it is specified that the ProductGroup
instances 11 to 20 should be returned.
query = db.getOQLQuery ("select p from ProductGroup as p LIMIT $1 OFFSET $2");
query.bind(10);
|
In the case a RDBMS does not support LIMIT/OFFSET clauses, a SyntaxNotSupportedException
will be thrown.
|