I am running tests to see if I can use the Oracle Berkeley XML database as a backend to a web application but am running into query response performance limitations. As per the suggestions for performance related questions, I have pulled together answers to the series of questions that need to be addressed, and they are given below. The basic issue at stake, however, is am I being realistic about what I can expect to achieve with the database?
Regards
Geoff Shuetrim
Oracle Berkeley DB XML database performance.
Berkeley DB XML Performance Questionnaire
1. Describe the Performance area that you are measuring? What is the
current performance? What are your performance goals you hope to
achieve?
I am using the database as a back end to a web application that is expected
to field a large number of concurrent queries.
The database scale is described below.
Current performance involves responses to simple queries that involve 1-2
minute turn around (this improves after a few similar queries have been run,
presumably because of caching, but not to a point that is acceptable for
web applications).
Desired performance is for queries to execute in milliseconds rather than
minutes.
2. What Berkeley DB XML Version? Any optional configuration flags
specified? Are you running with any special patches? Please specify?
Berkeley DB XML Version: 2.4.16.1
Configuration flags: enable-java -b 64 prefix=/usr/local/BerkeleyDBXML-2.4.16
No special patches have been applied.
3. What Berkeley DB Version? Any optional configuration flags
specified? Are you running with any special patches? Please Specify.
Berkeley DB Version? 4.6.21
Configuration flags: None. The Berkeley DB was built and installed as part of the
Oracle Berkeley XML database build and installation process.
No special patches have been applied.
4. Processor name, speed and chipset?
Intel Core 2 CPU 6400 @ 2.13 GHz (1066 FSB) (4MB Cache)
5. Operating System and Version?
Ubuntu Linux 8.04 (Hardy) with the 2.6.24-23 generic kernel.
6. Disk Drive Type and speed?
300 GB 7200RPM hard drive.
7. File System Type? (such as EXT2, NTFS, Reiser)
EXT3
8. Physical Memory Available?
Memory: 3.8GB DDR2 SDRAM
9. Are you using Replication (HA) with Berkeley DB XML? If so, please
describe the network you are using, and the number of Replica’s.
No.
10. Are you using a Remote Filesystem (NFS) ? If so, for which
Berkeley DB XML/DB files?
No.
11. What type of mutexes do you have configured? Did you specify
–with-mutex=? Specify what you find inn your config.log, search
for db_cv_mutex?
I did not specify -with-mutex when building the database.
config.log indicates:
db_cv_mutex=POSIX/pthreads/library/x86_64/gcc-assembly
12. Which API are you using (C++, Java, Perl, PHP, Python, other) ?
Which compiler and version?
I am using the Java API.
I am using the gcc 4.2.4 compiler.
I am using the g++ 4.2.4 compiler.
13. If you are using an Application Server or Web Server, please
provide the name and version?
I am using the Tomcat 5.5 application server.
It is not using the Apache Portable Runtime library.
It is being run using a 64 bit version of the Sun Java 1.5 JRE.
14. Please provide your exact Environment Configuration Flags (include
anything specified in you DB_CONFIG file)
I do not have a DB_CONFIG file in the database home directory.
My environment configuration is as follows:
Threaded = true
AllowCreate = true
InitializeLocking = true
ErrorStream = System.err
InitializeCache = true
Cache Size = 1024 * 1024 * 500
InitializeLogging = true
Transactional = false
TrickleCacheWrite = 20
15. Please provide your Container Configuration Flags?
My container configuration is done using the Java API.
The container creation code is:
XmlContainerConfig containerConfig = new XmlContainerConfig();
containerConfig.setStatisticsEnabled(true);
XmlContainer container = xmlManager.createContainer("container",containerConfig);
I am guessing that this means that the only flag I have set is the one
that enables recording of statistics to use in query optimization.
I have no other container configuration information to provide.
16. How many XML Containers do you have?
I have one XML container.
The container has 2,729,465 documents.
The container is a node container rather than a wholedoc container.
Minimum document size is around 1Kb.
Maximum document size is around 50Kb.
Average document size is around 2Kb.
I am using document data as part of the XQueries being run. For
example, I condition query results upon the values of attributes
and elements in the stored documents.
The database has the following indexes:
xmlIndexSpecification = dataContainer.getIndexSpecification();
xmlIndexSpecification.replaceDefaultIndex("node-element-presence");
xmlIndexSpecification.addIndex(Constants.XBRLAPINamespace,"fragment","node-element-presence");
xmlIndexSpecification.addIndex(Constants.XBRLAPINamespace,"data","node-element-presence");
xmlIndexSpecification.addIndex(Constants.XBRLAPINamespace,"xptr","node-element-presence");
xmlIndexSpecification.addIndex("","stub","node-attribute-presence");
xmlIndexSpecification.addIndex("","index", "unique-node-attribute-equality-string");
xmlIndexSpecification.addIndex(Constants.XBRL21LinkNamespace,"label","node-element-substring-string");
xmlIndexSpecification.addIndex(Constants.GenericLabelNamespace,"label","node-element-substring-string");
xmlIndexSpecification.addIndex("","name","node-attribute-substring-string");
xmlIndexSpecification.addIndex("","parentIndex", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","uri", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","type", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","targetDocumentURI", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","targetPointerValue", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","absoluteHref", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","id","node-attribute-equality-string");
xmlIndexSpecification.addIndex("","value", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","arcroleURI", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","roleURI", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","name", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","targetNamespace", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","contextRef", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","unitRef", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","scheme", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","value", "node-attribute-equality-string");
xmlIndexSpecification.addIndex(Constants.XBRL21Namespace,"identifier", "node-element-equality-string");
xmlIndexSpecification.addIndex(Constants.XMLNamespace,"lang","node-attribute-equality-string");
xmlIndexSpecification.addIndex(Constants.XLinkNamespace,"label","node-attribute-equality-string");
xmlIndexSpecification.addIndex(Constants.XLinkNamespace,"from","node-attribute-equality-string");
xmlIndexSpecification.addIndex(Constants.XLinkNamespace,"to","node-attribute-equality-string");
xmlIndexSpecification.addIndex(Constants.XLinkNamespace,"type","node-attribute-equality-string");
xmlIndexSpecification.addIndex(Constants.XLinkNamespace,"arcrole","node-attribute-equality-string");
xmlIndexSpecification.addIndex(Constants.XLinkNamespace,"role","node-attribute-equality-string");
xmlIndexSpecification.addIndex(Constants.XLinkNamespace,"label","node-attribute-equality-string");
xmlIndexSpecification.addIndex(Constants.XBRLAPILanguagesNamespace,"language","node-element-presence");
xmlIndexSpecification.addIndex(Constants.XBRLAPILanguagesNamespace,"code","node-element-equality-string");
xmlIndexSpecification.addIndex(Constants.XBRLAPILanguagesNamespace,"value","node-element-equality-string");
xmlIndexSpecification.addIndex(Constants.XBRLAPILanguagesNamespace,"encoding","node-element-equality-string");
17. Please describe the shape of one of your typical documents? Please
do this by sending us a skeleton XML document.
The following provides the basic information about the shape of all documents
in the data store.
<ns:fragment xmlns:ns="..." attrs...(about 20 of them)>
<ns:data>
Single element that varies from document to document but that
is rarely more than a few small elements in size and (in some cases)
a lengthy section of string content for the single element.
</ns:data>
</ns:fragment>
18. What is the rate of document insertion/update required or
expected? Are you doing partial node updates (via XmlModify) or
replacing the document?
Document insertion rates are not a first order performance criteria.
I do no document modifications using XmlModify.
When doing updates I replace the original document.
19. What is the query rate required/expected?
Not sure how to provide metrics for this but a single web page is
being generated, this can involve hundreds of queries. each of which
should be trivial to execute given the indexing strategy in use.
20. XQuery -- supply some sample queries
1. Please provide the Query Plan
2. Are you using DBXML_INDEX_NODES?
I am using DBXML_INDEX_NODES by default because I
am using a node container rather than a whole document
container.
3. Display the indices you have defined for the specific query.
4. If this is a large query, please consider sending a smaller
query (and query plan) that demonstrates the problem.
Example queries.
1. collection('browser')/*[@parentIndex='none']
<XQuery>
<QueryPlanToAST>
<LevelFilterQP>
<StepQP axis="parent-of-attribute" uri="*" name="*" nodeType="element">
<ValueQP container="browser" index="node-attribute-equality-string" operation="eq" child="parentIndex" value="none"/>
</StepQP>
</LevelFilterQP>
</QueryPlanToAST>
</XQuery>
2. collection('browser')/*[@stub]
<XQuery>
<QueryPlanToAST>
<LevelFilterQP>
<StepQP axis="parent-of-attribute" uri="*" name="*" nodeType="element">
<PresenceQP container="browser" index="node-attribute-presence-none" operation="eq" child="stub"/>
</StepQP>
</LevelFilterQP>
</QueryPlanToAST>
</XQuery>
3. qplan "collection('browser')/*[@type='org.xbrlapi.impl.ConceptImpl' or @parentIndex='asdfv_3']"
<XQuery>
<QueryPlanToAST>
<LevelFilterQP>
<StepQP axis="parent-of-attribute" uri="*" name="*" nodeType="element">
<UnionQP>
<ValueQP container="browser" index="node-attribute-equality-string" operation="eq" child="type" value="org.xbrlapi.impl.ConceptImpl"/>
<ValueQP container="browser" index="node-attribute-equality-string" operation="eq" child="parentIndex" value="asdfv_3"/>
</UnionQP>
</StepQP>
</LevelFilterQP>
</QueryPlanToAST>
</XQuery>
setnamespace xlink http://www.w3.org/1999/xlink
qplan "collection('browser')/*[@uri='http://www.xbrlapi.org/my/uri' and */*[@xlink:type='resource' and @xlink:label='description']]"
<XQuery>
<QueryPlanToAST>
<LevelFilterQP>
<NodePredicateFilterQP uri="" name="#tmp8">
<StepQP axis="parent-of-child" uri="*" name="*" nodeType="element">
<StepQP axis="parent-of-child" uri="*" name="*" nodeType="element">
<NodePredicateFilterQP uri="" name="#tmp1">
<StepQP axis="parent-of-attribute" uri="*" name="*" nodeType="element">
<ValueQP container="browser" index="node-attribute-equality-string" operation="eq" child="label:http://www.w3.org/1999/xlink"
value="description"/>
</StepQP>
<AttributeJoinQP>
<VariableQP name="#tmp1"/>
<ValueQP container="browser" index="node-attribute-equality-string" operation="eq" child="type:http://www.w3.org/1999/xlink"
value="resource"/>
</AttributeJoinQP>
</NodePredicateFilterQP>
</StepQP>
</StepQP>
<AttributeJoinQP>
<VariableQP name="#tmp8"/>
<ValueQP container="browser" index="node-attribute-equality-string" operation="eq" child="uri" value="http://www.xbrlapi.org/my/uri"/>
</AttributeJoinQP>
</NodePredicateFilterQP>
</LevelFilterQP>
</QueryPlanToAST>
</XQuery>
21. Are you running with Transactions? If so please provide any
transactions flags you specify with any API calls.
I am not running with transactions.
22. If your application is transactional, are your log files stored on
the same disk as your containers/databases?
The log files are stored on the same disk as the container.
23. Do you use AUTO_COMMIT?
Yes. I think that it is a default feature of the DocumentConfig that
I am using.
24. Please list any non-transactional operations performed?
I do document insertions and I do XQueries.
25. How many threads of control are running? How many threads in read
only mode? How many threads are updating?
One thread is updating. Right now one thread is running queries. I am
not yet testing the web application with concurrent users given the
performance issues faced with a single user.
26. Please include a paragraph describing the performance measurements
you have made. Please specifically list any Berkeley DB operations
where the performance is currently insufficient.
I have loaded approximately 7 GB data into the container and then tried
to run the web application using that data. This involves running a broad
range of very simple queries, all of which are expected to be supported
by indexes to ensure that they do not require XML document traversal activity.
Querying performance is insufficient, with even the most basic queries
taking several minutes to complete.
27. What performance level do you hope to achieve?
I hope to be able to run a web application that simultaneously handles
page requests from hundreds of users, each of which involves a large
number of database queries.
28. Please send us the output of the following db_stat utility commands
after your application has been running under "normal" load for some
period of time:
% db_stat -h database environment -c
1038 Last allocated locker ID
0x7fffffff Current maximum unused locker ID
9 Number of lock modes
1000 Maximum number of locks possible
1000 Maximum number of lockers possible
1000 Maximum number of lock objects possible
155 Number of current locks
157 Maximum number of locks at any one time
200 Number of current lockers
200 Maximum number of lockers at any one time
13 Number of current lock objects
17 Maximum number of lock objects at any one time
1566M Total number of locks requested (1566626558)
1566M Total number of locks released (1566626403)
0 Total number of locks upgraded
852 Total number of locks downgraded
3 Lock requests not available due to conflicts, for which we waited
0 Lock requests not available due to conflicts, for which we did not wait
0 Number of deadlocks
0 Lock timeout value
0 Number of locks that have timed out
0 Transaction timeout value
0 Number of transactions that have timed out
712KB The size of the lock region
21807 The number of region locks that required waiting (0%)
% db_stat -h database environment -l
0x40988 Log magic number
13 Log version number
31KB 256B Log record cache size
0 Log file mode
10Mb Current log file size
0 Records entered into the log
28B Log bytes written
28B Log bytes written since last checkpoint
1 Total log file I/O writes
0 Total log file I/O writes due to overflow
1 Total log file flushes
0 Total log file I/O reads
1 Current log file number
28 Current log file offset
1 On-disk log file number
28 On-disk log file offset
1 Maximum commits in a log flush
0 Minimum commits in a log flush
96KB Log region size
0 The number of region locks that required waiting (0%)
% db_stat -h database environment -m
500MB Total cache size
1 Number of caches
1 Maximum number of caches
500MB Pool individual cache size
0 Maximum memory-mapped file size
0 Maximum open file descriptors
0 Maximum sequential buffer writes
0 Sleep after writing maximum sequential buffers
0 Requested pages mapped into the process' address space
1749M Requested pages found in the cache (99%)
722001 Requested pages not found in the cache
911092 Pages created in the cache
722000 Pages read into the cache
4175142 Pages written from the cache to the backing file
1550811 Clean pages forced from the cache
19568 Dirty pages forced from the cache
3 Dirty pages written by trickle-sync thread
62571 Current total page count
62571 Current clean page count
0 Current dirty page count
65537 Number of hash buckets used for page location
1751M Total number of times hash chains searched for a page (1751388600)
8 The longest hash chain searched for a page
3126M Total number of hash chain entries checked for page (3126038333)
4535 The number of hash bucket locks that required waiting (0%)
278 The maximum number of times any hash bucket lock was waited for (0%)
1 The number of region locks that required waiting (0%)
0 The number of buffers frozen
0 The number of buffers thawed
0 The number of frozen buffers freed
1633189 The number of page allocations
4301013 The number of hash buckets examined during allocations
259 The maximum number of hash buckets examined for an allocation
1570522 The number of pages examined during allocations
1 The max number of pages examined for an allocation
184 Threads waited on page I/O
Pool File: browser
8192 Page size
0 Requested pages mapped into the process' address space
1749M Requested pages found in the cache (99%)
722001 Requested pages not found in the cache
911092 Pages created in the cache
722000 Pages read into the cache
4175142 Pages written from the cache to the backing file
% db_stat -h database environment -r
Not applicable.
% db_stat -h database environment -t
Not applicable.
vmstat
r b swpd free buff cache si so bi bo in cs us sy id wa
1 4 40332 773112 27196 1448196 0 0 173 239 64 1365 19 4 72 5
iostat
Linux 2.6.24-23-generic (dell) 06/02/09
avg-cpu: %user %nice %system %iowait %steal %idle
18.37 0.01 3.75 5.67 0.00 72.20
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 72.77 794.79 1048.35 5376284 7091504
29. Are there any other significant applications running on this
system? Are you using Berkeley DB outside of Berkeley DB XML?
Please describe the application?
No other significant applications are running on the system.
I am not using Berkeley DB outside of Berkeley DB XML.
The application is a web application that organises the data in
the stored documents into hypercubes that users can slice/dice and analyse.
Edited by: Geoff Shuetrim on Feb 7, 2009 2:23 PM to correct the appearance of the query plans.