Pattern for caching results of database queries?
MagnusEMay 14 2009 — edited May 14 2009I am looking at caching results of some slow and common queries in a coherence cache and would like some suggestions about good "patterns".
In this case we would be using a cache beside pattern for the data stored in the database/cache where all updates are written to both the cache and the database using an XA-transaction (we need to do it this way because there are some consistency requirements that are hard to verify (at least with decent performance) using only the cache). We intend to use a "near cache" with a size limited back tier.
We are thinking of invalidating the "saved query result cache" as part of each update transaction (we have some simple rules that allow us to know what query results that needs to be discarded for each change).
The part that is a bit tricky is to ensure that no query may be in progress while an update of data related to the query is also taking place where the invalidation could be performed BEFORE the query result is inserted in the result cache... I am not concerned with exact time order between queries and updates that are in progress at the same time but must be 100% sure that no "stale" information remain (or can be inserted!) in the cache after the update transaction has completed...
It is not enough to rely on time based expiry of the result-cache (I would have to set the time so short that the cache would not give much of an improvement)
I am thinking on something along the following lines:
When performing a query (that may already be cached):
1. Perform a dirty read against the result cache to see if a query result with the specified parameter already exists there - if so use it!
2. If not found lock a key with the specified parameter and when the lock is obtained check once more that no value still exists, if one now exists unlock and use it!
3. If still no value exists perform the database query, save the result in the cache, unlock and use the result
When performing a change using an XA-transaction:
Delete cached query results records corresponding to the changes performed - this can be done without first checking if there really are any results for the parameter or not.
I assume that the preparation stage of the transaction will lock the deleted records in the query result cache preventing concurrent updates and invalidations...
Can anybody see some windows of vulnerability here? Is it for instance possible that the cache lock held by the XA-transaction is released before the database update has completed (that way a query could be performed against old data and be stored in the cache after the invalidation were performed) or is this just the kind of problems that the XA-protocol is designed to prevent?
An alternative I have been considering for how to perform the database query / caching of the result would be to let a cache store class (for the query result cache) perform the queries when no entry exists in the cache but I am not sure how I would change the locking protocol to make it work - can I even lock a cache entry from this kind or component or would that cause a deadlock?
All suggestions, thought etc are warmly appreciated!
Best Regards
Magnus