Hello all,
recently I was having issue with the files were being slow to list in the OS level what I mean is when I go to directories where I have files and type (ls) this would stuck for a while and then give me the list of the files after couple of second... and yes the number of files is a lot like typically around 11K -20K... while invesigating this I used normal OS command to check what is happening and I will past the command as well as the output respectively after this:
OS: CentOS release 6.7 (Final)
oracle version :11.2.0
disk: RAID10 with 4 disk
RAM: 32GB
processor: 40
command: top
RESULT:
top - 18:39:00 up 359 days, 2:29, 9 users, load average: 14.62, 15.82, 14.59
Tasks: 983 total, 10 running, 969 sleeping, 4 stopped, 0 zombie
Cpu(s): 17.3%us, 5.5%sy, 0.0%ni, 71.6%id, 5.5%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 32817448k total, 32617396k used, 200052k free, 147764k buffers
Swap: 30719996k total, 2676388k used, 28043608k free, 13506684k cached
as you can see above output the %wa is more and also load average keeps jumping from 6.00 to sometimes 17.00.
command: iostat
RESULT:
01/20/2017 06:43:15 PM
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 5.53 1682.47 26.16 263.95 1.23 7.60 62.36 0.02 0.06 0.05 1.55
01/20/2017 06:43:20 PM
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 83.60 3553.00 225.40 285.60 19.94 14.81 139.25 57.14 105.62 1.94 99.38
01/20/2017 06:43:25 PM
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 123.80 6209.80 268.40 302.40 20.17 25.63 164.34 22.43 44.86 1.75 99.74
as you can see the disk SDB which iin actual is RAID10 (suppose to be fast) with 4 disk the %util shows almost 100% and I have been monitoring this for while and this is consistent to dig deeper I target the users which uses the highest I/O on the disk and bellow is the result of that user oracle I/O other users barely uses 2%... as you can see LGWR is using almost the 98% of the IO all the time I monitored this for days and it does not go down it keeps the same all the time jumping from 93-99%
command: iotop -u oracle
RESULT:
Total DISK READ: 27.66 M/s | Total DISK WRITE: 3.86 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
22663 be/4 oracle 303.48 K/s 1959.18 K/s 0.00 % 97.61 % ora_lgwr_sidname
12519 be/4 oracle 24.97 M/s 7.68 M/s 0.00 % 28.87 % oraclesidname (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
22619 be/4 oracle 0.00 B/s 476.35 K/s 0.00 % 23.30 % ora_dbw1_sidname
22623 be/4 oracle 0.00 B/s 169.03 K/s 0.00 % 16.78 % ora_dbw2_sidname
22615 be/4 oracle 0.00 B/s 238.18 K/s 0.00 % 16.74 % ora_dbw0_sidname
22627 be/4 oracle 0.00 B/s 122.93 K/s 0.00 % 10.67 % ora_dbw3_sidname
so I check the redo logs they were 30MB each then I created 4 other redo logs with 100MB each to check if the IO goes down but unfortunately it still the same no changes on IO... can you guys guide me on what is happening here? what should be done and how to pin point where the issue is (bottleneck)... I remember a year ago it was not like this but now it's... the oracle database is not huge but there is at least 100K insert and update on it every day... the application use oracle only for auditing purpose so the total database files are only 30GB after one year.
I added new redo logs based on suggestion I saw while searching and it didn't effect so I query the view also and bellow is the view for memory resizing that shrink and grow frequently...
SELECT
TO_CHAR(start_time,'YYYY-MM-DD'),
COUNT(*)
FROM v$memory_resize_ops
GROUP BY TO_CHAR(start_time,'YYYY-MM-DD')
ORDER BY TO_CHAR(start_time,'YYYY-MM-DD')
DD |OPER_TYPE |COUNT(*) |
-----------|----------|---------|
2017-01-01 |GROW |15 |
2017-01-01 |SHRINK |15 |
2017-01-02 |GROW |3 |
2017-01-02 |SHRINK |3 |
2017-01-03 |GROW |3 |
2017-01-03 |SHRINK |3 |
2017-01-04 |GROW |9 |
2017-01-04 |SHRINK |9 |
2017-01-05 |GROW |3 |
2017-01-05 |SHRINK |3 |
2017-01-06 |GROW |6 |
2017-01-06 |SHRINK |6 |
2017-01-07 |GROW |15 |
2017-01-07 |SHRINK |15 |
2017-01-08 |GROW |18 |
2017-01-08 |SHRINK |18 |
2017-01-09 |GROW |6 |
2017-01-09 |SHRINK |6 |
2017-01-10 |GROW |6 |
2017-01-10 |SHRINK |6 |
2017-01-11 |GROW |3 |
2017-01-11 |SHRINK |3 |
2017-01-12 |GROW |3 |
2017-01-12 |SHRINK |3 |
2017-01-13 |GROW |6 |
2017-01-13 |SHRINK |6 |
2017-01-14 |GROW |15 |
2017-01-14 |SHRINK |15 |
2017-01-15 |GROW |15 |
2017-01-15 |SHRINK |15 |
2017-01-16 |GROW |3 |
2017-01-16 |SHRINK |3 |
2017-01-17 |GROW |6 |
2017-01-17 |SHRINK |6 |
2017-01-18 |GROW |3 |
2017-01-18 |SHRINK |3 |
2017-01-19 |GROW |3 |
2017-01-19 |SHRINK |3 |
Here I have added 4 more redo logs 100MB each and this didn't had any effect...
SELECT GROUP#, bytes/1024/1024, status FROM v$log
GROUP# |BYTES/1024/1024 |STATUS |
-------|----------------|---------|
1 |30 |INACTIVE |
2 |30 |CURRENT |
3 |30 |INACTIVE |
4 |30 |INACTIVE |
5 |100 |INACTIVE |
6 |100 |INACTIVE |
7 |100 |INACTIVE |
8 |100 |INACTIVE |
Thanks for your time and suggestion in advance.
regards