Guys, need your help ..
What's the best way to handle the following (feeling guilty for the tremendous lack of brevity ):
Activities:
- Client loads 10 files into separate tables daily. Loading starts at 4 AM and finishes by 9 AM.
- Each of these 10 tables are partitioned on file landing date.
- Volume stored per partition in each table is approx. 5 GB (6 million records).
- Data retention set to keep latest 2 dumps.
- No indexing in any of these tables.
- Next, we have a staging table X which is partitioned on logging date (since all summaries and reports are based on logging date).
- X can retain last 6 month's records.
- X is locally indexed on file type (each file type against each of the 10 tables).
- X is compressed using Oracle basic compression.
- There is a procedure written to load data from each of the 10 tables into the table X, daily basis.
- The procedure is scheduled to run from 9.30 AM daily.
- In the very first run, the procedure picks up the 10 tables one after another and starts loading its data into X [Processing works serially, i.e., steps 12, 13 and 14 are completed for one file type, then the next file type is considered likewise till all 10 are processed].
- Analytical functions, decodes and ranking are all applied during each file load (Bulk Load, 5000 fetch limit).
- From the next runs, the procedure identifies if a new dump with more recent landing date is received, if yes, data for the previous dump is DELETED from X using file type (Iterated for each partition, INDEX FULL SCAN, Parallel 8 with NOLOGGING applied on table).
- Once all 10 files are loaded, the rows are compared with another table (approx. 200 GB in size, partition schema similar to X) and a flag is updated in X. This update is fired partition-wise for performance and less redo generation.
- Next, the rows in X are re-compared with a separate table (approx. 70 GB in size, partition key similar to X but retention of 45 days) and the flag is updated with a different value in X.
- At the end of each day, X will contain approx. 200 million records, total size being 200 GB but compressed). Average size of each partition in X is approx. 2 GB, non uniform, 20% being > 2 GB, 40% being < 2 GB but > 1 GB, rest changing).
- Logging is performed partition-wise to track count and apparent performance for later.
Challenges:
- No stats collected post deletion/insertion/updates. Time to analyze compressed partition is a supposed concern.
- NOLOGGING seems to have no affect as Archivelog mode is enabled. Redo has become a trouble to maintain.
- Step 13 gets stuck when there's a peak time in database. When stuck it runs for 2 hours to complete for each file. If not stuck, takes 5 minutes per file.
- Step 14 takes on average 2 hours per file type.
- Step 15 works within 5 minutes per partition per file type but for partitions with > 2 GB data takes 1 hour.
- Step 16 performance is more or less similar to Step 15.
- The entire process takes on average 20 hours to complete, while some days it gets pushed to 23-25 hours as well.
- The performance impacts collateral running processes and puts the Memory and CPU utilization > 80 to sometimes 99%.
- Long running processes causing SNAPSHOT TOO OLD error, correct UNDO has become hard to define.
Tablespace is 300 GB, Index tablespace is 200 GB, free tablespace = 35 GB, No. of CPUs = 16, CPU Cores = 8, RAM = 64 GB.
Idle Time 1417981896, Busy Time 2479534845, User Time 2342396842, SYS Time 114377768, IO Wait Time 743065425.
Average no. of processes running is 9.