Castor Persistence Architecture
OIDs and Locks
Service Providers (SPI)
Enterprise JavaBeans CMP
The Castor persistence engine is based on a layer architecture allowing different APIs to be plugged on top of the system, and different persistence engines to be combined in a single environment.
At the top level are the application level APIs. These are industry standard APIs that allow an application to be ported in and to other environments. These APIs consist of interfaces as well as semantics that make them suitable for a particular type of applications.
At the medium level is the Castor persistence mechanism. The persistence mechanism exposes itself to the application through the application level APIs. These typically have a one to one mapping with the persistence mechanism API. The persistence mechanism takes care of object caching and rollback, locking and deadlock detection, transactional integrity, and two phase commit.
At the bottom level are the Castor service providers. SPIs provide the persistence and query support into a variety of persistence mechanism. This version of Castor is bundled with an SQL 92 and LDAP persistence SPIs. Additional SPIs can be added, for example, alternative engines streamlined for Oracle, Sybase, DB2 and other databases.
This document will describe the persistence mechanism API and SPI to allow for those interested in adding new top level APIs or service providers.
The persistence mechanism is responsible for object caching and rollback, locking and deadlock detection, transaction integrity and two phase commit. All data access goes through the persistence mechanism. All operations are performed in the context of a transaction, even if the underlying SPI does not support transactions (e.g. LDAP).
The persistence API is defined in the package
All operations are performed through the context of a transaction. A transaction is
required in order to properly manage locking and caching, and to automatically commit or
rollback objects at transaction termination (write-at-commit). Persistence operations are
performed through the
The actual implementation of a transaction context is specific to each application API
and set of SPIs. One is created from an
Every persistence operation is performed within the context of a transaction. This allows changes to objects to be saved when the transaction commits and to be rolled back when the transaction aborts. Using a transactional API relieves the application developer from worrying about the commit/rollback phase. In addition it allows distributed transactions to be managed by a transactional environment, such as an EJB server.
Each time an object is retrieved or created the operation is performed in the context of a transaction and the object is recorded with the transaction and locked. When the transaction completes, the modified object is persisted automatically. If not all objects can be persisted, the transaction rolls back. The transaction context implements full two phase commit.
Each transaction sees it's own view of the objects it retrieves from persistent storage. Until the transaction commit, these changes are viewable only within that transaction. If the transaction rolled back, the objects are automatically reverted to their state in persistent storage.
The transaction context (
A transaction context is not created directly, but through a derived class that implements the proper mechanism for obtaining a new connection, committing and rolling back the connection. Note that commit and rollback operations are only required in a non-JTA environment. When running inside a JTA environment (e.g. an EJB server), the container is responsible to commit/rollback the underlying connection.
Application level APIs implement data sources that can be enlisted directly with the
transaction monitor through the JTA XAResource interface. A data source can
be implemented using
OIDs and Locks
Each object participating in a transaction is associated with an object identifier, or OID (org.exolab.castor.persist.OID). The OID identifies the object through its type and identity value. The identity object must be unique across all OIDs for the same object type in the same persistence engine.
Each object is also associated with a lock (org.exolab.castor.persist.ObjectLock). An object lock supports read and write locks with deadlock detection. Any number of transactions may acquire a read lock on the object. Read lock allows the transaction to retrieve the object, but not to delete or store it. Prior to deleting or storing the object, the transaction must acquire a write lock. Only one transaction may acquire a write lock, and a write lock will not be granted if there is any read lock on the object.
If an object is loaded read-only, a read lock is acquired at the begin of the load operation and realeased when the load is finished. Someone now could ask why do you acquire a read lock at all if it only lasts during the load operation. For an explanation we have to take a look on what happens if an object is loaded. Loading one object from database may cause castor to load a whole tree of objects with relations to each other. In the background castor may performs various queries to load all related objects. For this object tree to be consistent and reflect the persistent state of all the objects in the database at one point in time we need to lock all objects involved in all load operations to prevent any involved object to be write locked and changed by another transaction. If the load operation is finished these read locks are not required anymore. On the other hand, read locks are acquired to prevent an object already locked in the write mode from getting the read lock.
Write locks are acquired at the begin of the load operation similar then read locks. But in contrast to read locks, write locks are hold until the transaction is commited or rolled back. Holding the write lock until the end of the transaction is required as the changes to the objects, that could happen anytime between begin and end of the transaction, are only persisted if the transaction successfully commits.
If a transaction requires a read lock on an object which is write locked by another
transaction, or requires a write lock on an object which is read or write locked by another
transaction, the transaction will block until the lock is released, or the lock timeout has
elapsed. The lock timeout is a property of the transaction and is specified in seconds. A
This locking mechanism can lead to the possibility of a deadlock. The object lock mechanism provides automatic deadlock detection by tracking blocked transactions, without depending on a lock wait to timeout.
Write locks and exclusive locks are always delegated down to the persistence storage. In a distributed environment the database server itself provides the distributed locking mechanism. This approach assures proper concurrency control in a distributed environments where multiple application servers access the same database server.
The concurrency engine includes a layer that acts as a cache engine. This layer is
particularly useful for implementing optimistic locking and reducing synchronization
with the database layer. It is also used to perform dirty checks and object rollback.
The cache engine is implemented in org.exolab.castor.persist.CacheEngine and
exposed to the application through the
When an object is retrieved from persistent storage it is placed in the cache engine. Subsequent requests to retrieve the same object will return the cached copy (with the exception of pessimistic locking, more below). When the transaction commits, the cached copy will be updated with the modified object. When the transaction rolls back, the object will be reverted to its previous state from the cache engine.
In the event of any error or doubt, the cached copy will be dumped from the cache engine. The least recently used objects will be cleared from the cache periodically.
The cache engine is associated with a single persistence mechanism, e.g. a database source, and LDAP directory. Proper configuration is the only way to assure that all access to persistent storage goes through the same cache engine.
The concurrency engine works in two locking modes, based on the type of access requested by the application (typically through the API). Pessimistic locking are used in read-write access mode, optimistic locking are used in exclusive access mode.
In optimistic locking mode it is assumed that concurrent access to the same object is rare, or that objects are seldom modified. Therefore objects are retrieved with a read lock and are cached in memory across transactions.
When an object is retrieved for read/write access, if a copy of the object exists in the cache, that copy will be used. A read lock will be acquired in the cache engine, preventing other Castor transactions from deleting or modifying the object. However, no lock is acquired in persistent storage, allowing other applications to delete or modify the object while the Castor transaction is in progress.
To prevent inconsistency, Castor will perform dirty checking prior to storing the object, detecting whether the object has been modified in persistent storage. If the object has been modified outside the transaction, the transaction will rollback. The application must be ready for that possibility, or resort to using pessimistic locking.
In pessimistic locking mode it is assumed that concurrent access to the same object is the general case and that objects are often modified. Therefore objects are retrieved with a write lock and are always synchronized against the persistence storage. When talking to a database server, a request to load an object in exclusive mode will always load the object (unless already loaded in the same transaction) using a SELECT .. FOR UPDATE which assures no other application can change the object through direct access to the database server.
The locking mode is a property of the chosen access mode. The two access modes as well as read-only access can be combined in a single transaction, as a property of the query or object lookup. However, it is not possible to combine access modes for the same object, in certain cases this will lead to a conflict.
The pessimistic locking mode is not supported in LDAP and similar non-transactional servers. LDAP does not provide a mechanism to lock records and prevent concurrent access while they are being used in a transaction. Although all Castor access to the LDAP server is properly synchronized, it is possible that an external application will modify or delete a record while that record is being used in a Castor transaction.
Service Providers (SPI)
Castor supports a service provider architecture that allows different persistence storage providers to be plugged in. The default implementation includes an SQL 92 provider and an and an LDAP provider. Additional providers will be available optimized for a particular database server or other storage mechanisms.
The service provider is defined through two interfaces,
The object's identity is an object that unique identifies the object within persistent storage. Typically this would be the primary key on a table, or an RDN for LDAP. The identity object may be a simple type (e.g. integer, string) or a complex type.
The service provider may support the stamp mechanism for efficiently tracking dirty objects. The stamp mechanism is a unique identifier of the persistent object that changes when the object is modified in persistent storage. For example, a RAWID in Oracle or TIMESTAMP in Sybase. If a stamp is return by certain operations it will be stored with the object's OID and passed along to the store method.
The create method is called to create a new object in persistent storage. This method is called when the created object's identity is known. If the object's identity is not know when the object is made persistent, this method will be called only when the transaction commits. This method must check for duplicate identity with an already persistent object, create the object in persistent storage, such that successful completion of the transaction will result in durable storage, and retain a write lock on that object for the duration of the transaction.
The load method is called to load an object from persistent storage. An object is passed with the requested identity. If the object is found in persistent storage, it's values should be copied into the object passed as argument. If the lock flag is set, this method must create a write lock on the object at the same time it loads it.
The load method is called in two cases, when an object is first loaded or when an object is synchronized with the database (reloaded) in exclusive access mode. In the second case this method will be called with an object that is already set with values that are not considered valued, and must reset these values.
The store method is called to store an object into persistent storage. The store method is called for an object that was loaded and modified during a transaction when the transaction commits, as well as for an object that was created during the transaction. This method must update the object in persistent storage and retain a write lock on that object.
This method might be given two views of an object, one that is being saved and one that was originally loaded. If the original view is provided as well, this method should attempt to perform dirty check prior to storing the object. Dirty check entails a comparison of the original object against the copy in persistent storage, to determine whether the object has changed in persistent storage since it was originally loaded. The class descriptor will indicate whether the object is interested in dirty checking and which of its fields should be checked.
The delete method is called to delete an object from persistent storage. The delete method is called when the transaction commits and expects the object to deleted, if it exists. This method is not called when the transaction rolls back, objects created during the transaction with the create method are automatically rolled back by the persistent storage mechanism.
The writeLock method is called to obtain a write lock on an object for which only a read lock was previously obtained. The changeIdentity method is called to change the identity of the object after it has been stored with the old identity.
Enterprise JavaBeans CMP