Some Thoughts On An OWB Performance/Testing Framework
7475Mar 4 2005 — edited Feb 2 2007Hi all,
I've been giving some thought recently to how we could build a performance tuning and testing framework around Oracle Warehouse Builder. Specifically, I'm looking at was in which we can use some of the performance tuning techniques described in Cary Millsap/Jeff Holt's book "Optimizing Oracle Performance" to profile and performance tune mappings and process flows, and to use some of the ideas put forward in Kent Graziano's Agile Methods in Data Warehousing paper http://www.rmoug.org/td2005pres/graziano.zip and Steven Feuernstein's utPLSQL project http://utplsql.sourceforge.net/ to provide an agile/test-driven way of developing mappings, process flows and modules. The aim of this is to ensure that the mappings we put together are as efficient as possible, work individually and together as expected, and are quick to develop and test.
At the moment, most people's experience of performance tuning OWB mappings is firstly to see if it runs set-based rather than row-based, then perhaps to extract the main SQL statement and run an explain plan on it, then check to make sure indexes etc are being used ok. This involves a lot of manual work, doesn't factor in the data available from the wait interface, doesn't store the execution plans anywhere, and doesn't really scale out to encompass entire batches of mapping (process flows).
For some background reading on Cary Millsap/Jeff Holt's approach to profiling and performance tuning, take a look at http://www.rittman.net/archives/000961.html and http://www.rittman.net/work_stuff/extended_sql_trace_and_tkprof.htm. Basically, this approach traces the SQL that is generated by a batch file (read: mapping) and generates a file that can be later used to replay the SQL commands used, the explain plans that relate to the SQL, details on what wait events occurred during execution, and provides at the end a profile listing that tells you where the majority of your time went during the batch. It's currently the "preferred" way of tuning applications as it focuses all the tuning effort on precisely the issues that are slowing your mappings down, rather than database-wide issues that might not be relevant to your mapping.
For some background information on agile methods, take a look at Kent Graziano's paper, this one on test-driven development http://c2.com/cgi/wiki?TestDrivenDevelopment , this one http://martinfowler.com/articles/evodb.html on agile database development, and the sourceforge project for utPLSQL http://utplsql.sourceforge.net/. What this is all about is having a development methodology that builds in quality but is flexible and responsive to changes in customer requirements. The benefit of using utPLSQL (or any unit testing framework) is that you can automatically check your altered mappings to see that they still return logically correct data, meaning that you can make changes to your data model and mappings whilst still being sure that it'll still compile and run.
Observations On The Current State of OWB Performance Tuning & Testing
At present, when you build OWB mappings, there is no way (within the OWB GUI) to determine how "efficient" the mapping is. Often, when building the mapping against development data, the mapping executes quickly and yet when run against the full dataset, problems then occur. The mapping is built "in isolation" from its effect on the database and there is no handy tool for determining how efficient the SQL is.
OWB doesn't come with any methodology or testing framework, and so apart from checking that the mapping has run, and that the number of rows inserted/updated/deleted looks correct, there is nothing really to tell you whether there are any "logical" errors. Also, there is no OWB methodology for integration testing, unit testing, or any other sort of testing, and we need to put one in place. Note - OWB does come with auditing, error reporting and so on, but there's no framework for guiding the user through a regime of unit testing, integration testing, system testing and so on, which I would imagine more complete developer GUIs come with. Certainly there's no built in ability to use testing frameworks such as utPLSQL, or a part of the application that let's you record whether a mapping has been tested, and changes the test status of mappings when you make changes to ones that they are dependent on.
OWB is effectively a code generator, and this code runs against the Oracle database just like any other SQL or PL/SQL code. There is a whole world of information and techniques out there for tuning SQL and PL/SQL, and one particular methodology that we quite like is the Cary Millsap/Jeff Holt "Extended SQL Trace" approach that uses Oracle diagnostic events to find out exactly what went on during the running of a batch of SQL commands. We've been pretty successful using this approach to tune customer applications and batch jobs, and we'd like to use this, together with the "Method R" performance profiling methodology detailed in the book "Optimising Oracle Performance", as a way of tuning our generated mapping code.
Whilst we want to build performance and quality into our code, we also don't want to overburden developers with an unwieldy development approach, because what we'll know will happen is that after a short amount of time, it won't get used. Given that we want this framework to be used for all mappings, it's got to be easy to use, cause minimal overhead, and have results that are easy to interpret. If at all possible, we'd like to use some of the ideas from agile methodologies such as eXtreme Programming, SCRUM and so on to build in quality but minimise paperwork.
We also recognise that there are quite a few settings that can be changed at a session and instance level, that can have an effect on the performance of a mapping. Some of these include initialisation parameters that can change the amount of memory assigned to the instance and the amount of memory subsequently assigned to caches, sort areas and the like, preferences that can be set so that indexes are preferred over table scans, and other such "tweaks" to the Oracle instance we're working with. For reference, the version of Oracle we're going to use to both run our code and store our data is Oracle 10g 10.1.0.3 Enterprise Edition, running on Sun Solaris 64-bit.
Some initial thoughts on how this could be accomplished
- Put in place some method for automatically / easily generating explain plans for OWB mappings (issue - this is only relevant for mappings that are set based, and what about pre- and post- mapping triggers)
- Put in place a method for starting and stopping an event 10046 extended SQL trace for a mapping
- Put in place a way of detecting whether the explain plan / cost / timing for a mapping changes significantly
- Put in place a way of tracing a collection of mappings, i.e. a process flow
- The way of enabling tracing should either be built in by default, or easily added by the OWB developer. Ideally it should be simple to switch it on or off (perhaps levels of event 10046 tracing?)
- Perhaps store trace results in a repository? reporting? exception reporting?
at an instance level, come up with some stock recommendations for instance settings
- identify the set of instance and session settings that are relevant for ETL jobs, and determine what effect changing them has on the ETL job
- put in place a regime that records key instance indicators (STATSPACK / ASH) and allows reports to be run / exceptions to be reported
- Incorporate any existing "performance best practices" for OWB development
- define a lightweight regime for unit testing (as per agile methodologies) and a way of automating it (utPLSQL?) and of recording the results so we can check the status of dependent mappings easily
other ideas around testing?
Suggested Approach
- For mapping tracing and generation of explain plans, a pre- and post-mapping trigger that turns extended SQL trace on and off, places the trace file in a predetermined spot, formats the trace file and dumps the output to repository tables.
- For process flows, something that does the same at the start and end of the process. Issue - how might this conflict with mapping level tracing controls?
- Within the mapping/process flow tracing repository, store the values of historic executions, have an exception report that tells you when a mapping execution time varies by a certain amount
- get the standard set of preferred initialisation parameters for a DW, use these as the start point for the stock recommendations. Identify which ones have an effect on an ETL job.
- identify the standard steps Oracle recommends for getting the best performance out of OWB (workstation RAM etc) - see OWB Performance Tips http://www.rittman.net/archives/001031.html and Optimizing Oracle Warehouse Builder Performance http://www.oracle.com/technology/products/warehouse/pdf/OWBPerformanceWP.pdf
- Investigate what additional tuning options and advisers are available with 10g
- Investigate the effect of system statistics & come up with recommendations.
Further reading / resources:
- Diagnosing Performance Problems Using Extended Trace" Cary Millsap
http://otn.oracle.com/oramag/oracle/04-jan/o14tech_perf.html
- "Performance Tuning With STATSPACK" Connie Dialeris and Graham Wood
http://www.oracle.com/oramag/oracle/00-sep/index.html?o50tun.html
- "Performance Tuning with Statspack, Part II" Connie Dialeris and Graham Wood
http://otn.oracle.com/deploy/performance/pdf/statspack_tuning_otn_new.pdf
- "Analyzing a Statspack Report: A Guide to the Detail Pages" Connie Dialeris and Graham Wood
http://www.oracle.com/oramag/oracle/00-nov/index.html?o60tun_ol.html
- "Why Isn't Oracle Using My Index?!" Jonathan Lewis
http://www.dbazine.com/jlewis12.shtml
- "Performance Tuning Enhancements in Oracle Database 10g" Oracle-Base.com
http://www.oracle-base.com/articles/10g/PerformanceTuningEnhancements10g.php
- Introduction to Method R and Hotsos Profiler (Cary Millsap, free reg. required)
http://www.hotsos.com/downloads/registered/00000029.pdf
- Exploring the Oracle Database 10g Wait Interface (Robin Schumacher)
http://otn.oracle.com/pub/articles/schumacher_10gwait.html
- Article referencing an OWB forum posting
http://www.rittman.net/archives/001031.html
- How do I inspect error logs in Warehouse Builder? - OWB Exchange tip
http://www.oracle.com/technology/products/warehouse/pdf/Cases/case10.pdf
- What is the fastest way to load data from files? - OWB exchange tip
http://www.oracle.com/technology/products/warehouse/pdf/Cases/case1.pdf
- Optimizing Oracle Warehouse Builder Performance - Oracle White Paper
http://www.oracle.com/technology/products/warehouse/pdf/OWBPerformanceWP.pdf
- OWB Advanced ETL topics - including sections on operating modes, partition exchange loading
http://www.oracle.com/technology/products/warehouse/selfserv_edu/advanced_ETL.html
- Niall Litchfield's Simple Profiler (a creative commons-licensed trace file profiler, based on Oracle Trace Analyzer, that displays the response time profile through HTMLDB. Perhaps could be used as the basis for the repository/reporting part of the project)
http://www.niall.litchfield.dial.pipex.com/SimpleProfiler/SimpleProfiler.html
- Welcome to the utPLSQL Project - a PL/SQL unit testing framework by Steven Feuernstein. Could be useful for automating the process of unit testing mappings.
http://utplsql.sourceforge.net/
Relevant postings from the OTN OWB Forum
- Bulk Insert - Configuration Settings in OWB
http://forums.oracle.com/forums/thread.jsp?forum=57&thread=291269&tstart=30&trange=15
- Default Performance Parameters
http://forums.oracle.com/forums/thread.jsp?forum=57&thread=213265&message=588419&q=706572666f726d616e6365#588419
- Performance Improvements
http://forums.oracle.com/forums/thread.jsp?forum=57&thread=270350&message=820365&q=706572666f726d616e6365#820365
- Map Operator performance
http://forums.oracle.com/forums/thread.jsp?forum=57&thread=238184&message=681817&q=706572666f726d616e6365#681817
- Performance of mapping with FILTER
http://forums.oracle.com/forums/thread.jsp?forum=57&thread=273221&message=830732&q=706572666f726d616e6365#830732
- Poor mapping performance
http://forums.oracle.com/forums/thread.jsp?forum=57&thread=275059&message=838812&q=706572666f726d616e6365#838812
- Optimizing Mapping Performance With OWB
http://forums.oracle.com/forums/thread.jsp?forum=57&thread=269552&message=815295&q=706572666f726d616e6365#815295
- Performance of mapping with FILTER
http://forums.oracle.com/forums/thread.jsp?forum=57&thread=273221&message=830732&q=706572666f726d616e6365#830732
- Performance of the OWB-Repository
http://forums.oracle.com/forums/thread.jsp?forum=57&thread=66271&message=66271&q=706572666f726d616e6365#66271
- One large JOIN or many small ones?
http://forums.oracle.com/forums/thread.jsp?forum=57&thread=202784&message=553503&q=706572666f726d616e6365#553503
- NATIVE PL SQL with OWB9i
http://forums.oracle.com/forums/thread.jsp?forum=57&thread=270273&message=818390&q=706572666f726d616e6365#818390
Next Steps
Although this is something that I'll be progressing with anyway, I'd appreciate any comment from existing OWB users as to how they currently perform performance tuning and testing. Whilst these are perhaps two distinct subject areas, they can be thought of as the core of an "OWB Best Practices" framework and I'd be prepared to write the results up as a freely downloadable whitepaper. With this in mind, does anyone have an existing best practices for tuning or testing, have they tried using SQL trace and TKPROF to profile mappings and process flows, or have you used a unit testing framework such as utPLSQL to automatically test the set of mappings that make up your project?
Any feedback, add it to this forum posting or send directly through to me at mark@rittman.net. I'll report back on a proposed approach in due course.