Castor JDO - Best practice

License Codehaus OpenEJB OpenJMS OpenORB Tyrex

Old releases

General

Release 1.3

Release 1.3rc1

Release 1.2

Main

Home

About

Features

Download

Dependencies

Reference guide

Publications

JavaDoc

Maven 2 support

Maven 2 archetypes

DTD & Schemas

Recent HTML changes

News Archive

RSS news feed

Project Wiki

Development/Support

Mailing Lists

SVN/JIRA

Contributing

Support

Continuous builds

Prof. services

Related projects

Spring ORM support

Spring XML factories

WS frameworks

XML

XML Code Generator

JDO

Introduction

First steps

Using JDO

JDO Config

Types

JDO Mapping

JDO FAQ

JDO Examples

JDO HOW-TOs

Tips & Tricks

Other Features

JDO sample JAR

Tools

Schema generator

Advanced JDO

Caching

OQL

Trans. & Locks

Design

KeyGen

Long Trans.

Nested Attrs.

Pooling Examples

LOBs

Best practice

DDL Generator

Using DDL Generator

Properties

Ant task

Type Mapping

The Examples

3rd Party Tools

JDO Tests

XML Tests

Configuration

About

License

User stories

Contributors

Marketplace

Status, Todo

Changelog

Library

Contact

Project Name

Castor JDO - Best practice

Introduction
General suggestions
Further optimization

Introduction

There's many users of Castor JDO out there, who (want to) use Castor JDO in in high-volume applications. To fine-tune Castor for such environment, it is necessary to understand many of the product features in detail and to be able to balance their use according to the application needs. Even though many of these features are detailed in various places, people have frequently been asking for a 'best practise' document, a document that brings together these technical topics (in one place) and presents them as a set of easy-to-use recipes.

Please be aware that this document is under construction, but still we believe that - even when in its conception phase - it provides valuable information to users of Castor JDO.

General suggestions

Let's start with some general suggestions that you should have a look at. Please don't feel upset if some are really primitive but there may be users reading this document that are not aware of them.

Switch to version 0.9.9 of Castor as we have fixed some 100+ bugs that may cause some of your problems.

Sidenote: Performance has, generally, improved recently. If you're not seeing performance improvements, then it's worth spending some time thinking about why.
Initialize your JDOManager instance once and reuse it all over your application. Don't reuse the Database instances. Creating them is inexpensive, and JDBC rules state that one thread <-> one JDBC connection is the rule. Do not multithread inside of a Database instance; as a corrolary, do not multithread on a single JDBC connection.

Use a Datasource instead of a Driver configuration as they enable connection pooling which gives you a great performance improvement.

We highly suggest DBCP, here, with the beneficial use of prepared statement caching.

Should you be running on a system where read performance is critical, feel free to take the SQL code generated by castor, and dumped to logs during the DB mapping load in debug output, and turn those into stored procedures that you then invoke via SQL CALL to perform those loads; however, I find personally that stored procedures would be a minimal improvement over the DBCP prepared statement cache; your mileage may vary. db.load() has performance benefits that are worth keeping, IMO, and the pleasure of having pretty stored procedures in your database is far outweighed by the nightmare of change management.

Have a look at the HTML docs for Jakarta DBCP, which has details about how to use and configure DBCP with Castor and Tomcat.

Note: 'prepared statement caches' refer to the fact that DBCP is a JDBC 3.0-compliant product, and as such has to support caching of prepared statements. This basically allows the JDBC driver to maintain a pool of prepared statements across all connections, a feature that has been added to the JDBC specification with release 3.0 only.

DBCP setup is generally outside of the scope of this list, but basically, here's my two cent description:

- Use tomcat 5.5, because mucking about in server.xml sucks. For those of you working with Tomcat 4.1.x, there's no need to muck about in server.xml, either. Afaik, a web app can be deployed using a web app descriptor copied into $TOMCAT_HOME/webapps, which is the place top define anything specific to a web app context. Details can vary, of course.

- Create a META-INF directory in your WAR deploy scripts, and put a context.xml in it.

- In that context.xml, describe all of the things you want to be made available via JNDI to your application. These include things like UserTransaction and TransactionManager (for those of us using JOTM), all your database connection pools as datasources, etc. You can also add your JDO factory here, should you choose to do so.

- Configure Castor to load those JNDI names to retrieve connections.

Hit the deploy button, and bob's your uncle.

Always commit or rollback your transactions and close your Database instances properly; also in fail situations.

Note:Just the obvious general rule on Java objects that hold resources: Don't wait for the VM to finalize to have something happen to your objects when you could have released critical resources at the appropriate point in the codebase.

Keep your transactions as short as possible. If you have an open transaction that holds a write lock on an object no other transaction can get a write lock on the same object which will lead to a LockNotGrantedException.

execute() {
   Database db = jdo.getDatabase();
   db.begin();
   // query objects from database with read only
   db.commit();
   db.close();

   // do some time consuming processing with the data

   Database db = jdo.getDatabase();
   db.begin();
   // use db.load() to load the objects you need to change again
   // create, update or delete some objects
   db.commit();
   db.close();
}

It doesn't make sense to make a own transaction for every change you want to do to an object as this will slow down your application. On the other hand if you have transactions with lots of objects involved taking an valuable amonth of time you may consider to split this transactions to reduce the time an object is locked.

Also keep in mind that folks using lockmode of DBLocked do FOR UPDATE calls on things they read while the transaction is open; if you're using dblocked mode, be aware of how your application does things. If you're in one of the other modes, locks happen inside castor, and it's your responsibility to always use the right access mode when accessing content.

If you can, for example, decide at the API layer whether or not an operation is going to ever need to modify an object, and know that you will only ever use an instance in read only mode, load objects with access mode read only, and not shared.

Limit use of read-write objects to situations in which it is likely you will need to perform updates.

Imagine, for a moment, that these transactions were in DBLocked mode - transactions which translate directly into locks on the database.

If you're opening something up for modification on the DB - marking it as select FOR UPDATE - then that row will be locked until you commit. The database would prevent any other transaction that wants to touch that row from doing anything to it, and it would block on your transaction - deadlock at the SQL level.

Castor does the same things internally for its own access modes - Shared and Exclusive. Each has different locking semantics; having good performance means understanding those locking semantics.

For example - read only transactions (should be) cheap. So there's no issue with holding those transactions open a long time; because they only translate, for an instant, into a lock. The lock is released the moment the load is completed and the object is dropped into read-only state within your transaction; read only operations therefore operate, pretty much, without locking.

The lock is of course acquired because you might also have it in SHARED or EXCLUSIVE mode on another thread - and that read-only operation isn't safe until those transactions close.

Once the lock is released, you're lock-free again, so the transaction basically has nothing in it that needs anything doing.

That's not to say that holding transactions open is good practice - but transactions should always be thought of as cheap to create and destroy and expensive to hold on to - never do heavy computation inside of one, unless you're willing to live with the consequences that arise from holding transactions on object sets that others might need to access.

Query or load your objects read only whenever possible. Even if castor creates a lock on them this does not prevent other threads from reading or writing them. Read only queries are also about 7 times faster compared with default shared mode.

for queries:

String oql = "select o from FooBar o";
Query query = db.getOQLQuery(oql);
QueryResults results = query.execute(Database.ReadOnly);

to load an object by its identity:

Integer id = new Integer(7);
Foo foo = (Foo) db.load(Foo.class, id, Database.ReadOnly);

Default accessmode is evaluated as follows:

- if specified castor uses access mode from db.load() or query.execute(),

- if this is not available it takes access mode specified in class mapping,

- if nothing is specified in mapping it defaults to shared.

One cannot stress how important this is: If 99% of your application never writes an object, and you as a programmer know it won't, then do something about it. If you're in a situation where you want the object to be read-only most of the time, and only want a writable every now and then, do so just-in-time by performing a load-modify-store operation in a single transaction for the shareable you want.

In other words: Don't use read-write objects unless you know you're likely to want to write them.

If there is a possibility you should prefer Database.load(Class, object) over Query.execute(String). I suggest that as load() first tries to load the requested object from cache and only retrieves it from database when it is not available there. When executing queries with Query.execute() the object will always be loaded from database without looking at the cache. You may gain a improvement by a factor of 10 and more when changing from Query.execute() to Database.load().

Further optimization

We hope above suggestions help you to resolve the problems you have. If you still need more performance there are areas of improvement that are more difficult to resolve. For further ideas to improve your applications performance you should take a loock at out performance test suite (PTF) which you can find in Castor's source distribution under: src/tests/ptf/jdo.

Now, there's lots left to do - there is still the issue, for example, of dependent objects being slightly sub-optimal in performance both in terms of the SQL that gets generated and the way it gets managed - but there will be improvements over time to the way that this and other operations are performed.

But performance should be good right now. If it isn't, you'll need to think about whether you are using the optimal set of operations. No environment can predict your requirements - hinting to the system when objects can be safely assumed to be read-only is vital to a high-performance implementation.

Copyright © 1999-2005 ExoLab Group, Intalio Inc., and Contributors. All rights reserved.

Java, EJB, JDBC, JNDI, JTA, Sun, Sun Microsystems are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and in other countries. XML, XML Schema, XSLT and related standards are trademarks or registered trademarks of MIT, INRIA, Keio or others, and a product of the World Wide Web Consortium. All other product names mentioned herein are trademarks of their respective owners.