Extremely slow inserts/updates reported after HW upgrade...
tOPsEEKNov 1 2009 — edited Nov 7 2009Hi gyus,
I'll try to be as descriptive as I can. It's this project in which we have to move circa 6 mission critical (24x7) and mostly OLTP databases from MS Windows 2003 (DB on local disks) to HP-UX IA (CA metrocluster, HP XP 12000 disk array) - all ORA10gR2 10.2.0.4. And everything was perfect until we moved this XYZ database...
Almost immediately users reported "considerable" performance degradation. According to 3rd party application log they get almost 40 secs. instead of previously recorded 10.
We, I mean Oracle and HP specialists, haven't noticed/recorded any significant peeks/bottlenecks (RAM, CPU, Disk I/O).
Feel free to check 3 AWR reports and the init.ora at [http://www.mediafire.com/?sharekey=0269c9bc606747b47f7ec40ada4772a6e04e75f6e8ebb871]
1_awrrpt_standard.txt - standard workload during 8 hours (peek hours are from 8-12AM)
2_awrrpt_2hrs_ca.txt - standard workload during 2 peek hours (8-10)
3_awrrpt_2hrs_noca.txt - standard workload during 2 peek hours (10-12) with CA disk mirroring disabled
Of course, I've checked the ADDM reports - and first, I'd like to ask why ADDM keeps on reporting the following (on all database instances on this
cluster node):
FINDING 1: 100% impact (310 seconds)
------------------------------------
Significant virtual memory paging was detected on the host operating system.
RECOMMENDATION 1: Host Configuration, 100% benefit (310 seconds)
Is it just some kind of false alarm (like we use to get on MS Windows)? Both nodes are running on 32gigs of RAM
with roughly more than 10gigs constantly free.
Second, as ADDM reported:
FINDING 2: 44% impact (135 seconds)
-----------------------------------
Waits on event "log file sync" while performing COMMIT and ROLLBACK operations
were consuming significant database time.
we've tried to split CA disk mirroring, using RAID 10 for redo log file disks etc. etc. No substantial performance gain was reported from users (though I've noticed some in AWR reports).
Despite confusing app. users' feedback I'm nearly sure that our bottleneck are redo log file disks. Why? Previously (old HW) we had 1-3 ms avg wait on log file sync and log file parallel write and now (new HW, RAID5/RAID10 we've tested both) - it's 8 ms or even more. We were able to get 2ms only with CA switched off (HP etrocluster
disk array mirroring).
And that brings up two new questions:
1. Does redo log group mirroring (2 on 2 separate disks vs. 1 on 1 disk) have any
significant impact on abovementioned wait events? I mean what performance gain
could I expect when I drop all "secondary" redo log members?
2. Why do we get almost identical response times when we run bulk insert/update tests (say
1000000 rows) against old and new DB/HW?
Thanks in advance,
tOPsEEK
Edited by: smutny on 1.11.2009 17:39
Edited by: smutny on 1.11.2009 17:46