Strategies on delta in ETL-process
76541Oct 24 2005 — edited Feb 25 2006LS,
OWB 10g Release 1
Oracle 10g Release 1 or 2
Im looking for a good strategy on an effective and fast ETL-process. I intend to do so with a delta of (dayly) data as soon in the process as possible.
If my source databases (although also Oracle 10gR1) do NOT wish to implement and maintain the Change Data Capture to service my datawarehouse and present me my desired delta, what other strategies do we have on our sleeves?
I have to feed a 3N-like table in the datawarehouse. This would be very much like the implementation of a Slowly Changing Dimension Type 2. If there is a way to skip a MINUS or a 2-sided OUTER JOIN, this would be preferred.
Suppose I get the complete source-databases dayly in the form of copies of datafiles. Or maybe just de datafiles of the tablespaces concerned. I could promote these datafiles to a working database, beeing a copy of todays situation.
The datafiles I got yesterday I could use to promote these to a working database, beeing a copy of yesterdays situation. Then, I can compare the data in the tables of my interest, resulting in the desired delta. But how?
Using a MINUS over a DatabaseLink doesnt sound like fast.
Using a MINUS doesnt sound like fast at all.
Could I use the combination of implementing CDC in yesterdays database and some form of merge? Since these two databases are completely at my will, I could do anything to them I like. But what dó I like and why do like it so much?
Is there some form of data-pump I could use in order to insert/update/delete the situation of today over the situation of yesterday? A simple merge, matching on the pk-columns, would update far too many rows, also the rows that were not changed at all.
Come on guys, how do you keep up performance of your ETL-process? The prospect of comparing 50 to 60 milj rows on a dayly basis, just in order to find the 100 rows added today, is not a comforting one on the aspect of performance. Maybe theres a more basic method when source-applications (even in Oracle 10gR1) dont want to do anything concerning datawarehousing: you can use our dayly backup or something like that.
Regards,
André Klück