Oracle Database Discussions

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Oracle Log writer highest I/O usage in the disk

user11220983Jan 20 2017 — edited Jan 26 2017

Hello all,

recently I was having issue with the files were being slow to list in the OS level what I mean is when I go to directories where I have files and type (ls) this would stuck for a while and then give me the list of the files after couple of second... and yes the number of files is a lot like typically around 11K -20K... while invesigating this I used normal OS command to check what is happening and I will past the command as well as the output respectively after this:

OS: CentOS release 6.7 (Final)

oracle version :11.2.0

disk: RAID10 with 4 disk

RAM: 32GB

processor: 40

command: top

RESULT:

top - 18:39:00 up 359 days, 2:29, 9 users, load average: 14.62, 15.82, 14.59

Tasks: 983 total, 10 running, 969 sleeping, 4 stopped, 0 zombie

Cpu(s): 17.3%us, 5.5%sy, 0.0%ni, 71.6%id, 5.5%wa, 0.0%hi, 0.1%si, 0.0%st

Mem: 32817448k total, 32617396k used, 200052k free, 147764k buffers

Swap: 30719996k total, 2676388k used, 28043608k free, 13506684k cached

as you can see above output the %wa is more and also load average keeps jumping from 6.00 to sometimes 17.00.

command: iostat

RESULT:

01/20/2017 06:43:15 PM

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util

sdb 5.53 1682.47 26.16 263.95 1.23 7.60 62.36 0.02 0.06 0.05 1.55

01/20/2017 06:43:20 PM

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util

sdb 83.60 3553.00 225.40 285.60 19.94 14.81 139.25 57.14 105.62 1.94 99.38

01/20/2017 06:43:25 PM

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util

sdb 123.80 6209.80 268.40 302.40 20.17 25.63 164.34 22.43 44.86 1.75 99.74

as you can see the disk SDB which iin actual is RAID10 (suppose to be fast) with 4 disk the %util shows almost 100% and I have been monitoring this for while and this is consistent to dig deeper I target the users which uses the highest I/O on the disk and bellow is the result of that user oracle I/O other users barely uses 2%... as you can see LGWR is using almost the 98% of the IO all the time I monitored this for days and it does not go down it keeps the same all the time jumping from 93-99%

command: iotop -u oracle

RESULT:

Total DISK READ: 27.66 M/s | Total DISK WRITE: 3.86 M/s

TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND

22663 be/4 oracle 303.48 K/s 1959.18 K/s 0.00 % 97.61 % ora_lgwr_sidname

12519 be/4 oracle 24.97 M/s 7.68 M/s 0.00 % 28.87 % oraclesidname (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

22619 be/4 oracle 0.00 B/s 476.35 K/s 0.00 % 23.30 % ora_dbw1_sidname

22623 be/4 oracle 0.00 B/s 169.03 K/s 0.00 % 16.78 % ora_dbw2_sidname

22615 be/4 oracle 0.00 B/s 238.18 K/s 0.00 % 16.74 % ora_dbw0_sidname

22627 be/4 oracle 0.00 B/s 122.93 K/s 0.00 % 10.67 % ora_dbw3_sidname

so I check the redo logs they were 30MB each then I created 4 other redo logs with 100MB each to check if the IO goes down but unfortunately it still the same no changes on IO... can you guys guide me on what is happening here? what should be done and how to pin point where the issue is (bottleneck)... I remember a year ago it was not like this but now it's... the oracle database is not huge but there is at least 100K insert and update on it every day... the application use oracle only for auditing purpose so the total database files are only 30GB after one year.

I added new redo logs based on suggestion I saw while searching and it didn't effect so I query the view also and bellow is the view for memory resizing that shrink and grow frequently...

SELECT

TO_CHAR(start_time,'YYYY-MM-DD'),

COUNT(*)

FROM v$memory_resize_ops

GROUP BY TO_CHAR(start_time,'YYYY-MM-DD')

ORDER BY TO_CHAR(start_time,'YYYY-MM-DD')

DD |OPER_TYPE |COUNT(*) |

-----------|----------|---------|

2017-01-01 |GROW |15 |

2017-01-01 |SHRINK |15 |

2017-01-02 |GROW |3 |

2017-01-02 |SHRINK |3 |

2017-01-03 |GROW |3 |

2017-01-03 |SHRINK |3 |

2017-01-04 |GROW |9 |

2017-01-04 |SHRINK |9 |

2017-01-05 |GROW |3 |

2017-01-05 |SHRINK |3 |

2017-01-06 |GROW |6 |

2017-01-06 |SHRINK |6 |

2017-01-07 |GROW |15 |

2017-01-07 |SHRINK |15 |

2017-01-08 |GROW |18 |

2017-01-08 |SHRINK |18 |

2017-01-09 |GROW |6 |

2017-01-09 |SHRINK |6 |

2017-01-10 |GROW |6 |

2017-01-10 |SHRINK |6 |

2017-01-11 |GROW |3 |

2017-01-11 |SHRINK |3 |

2017-01-12 |GROW |3 |

2017-01-12 |SHRINK |3 |

2017-01-13 |GROW |6 |

2017-01-13 |SHRINK |6 |

2017-01-14 |GROW |15 |

2017-01-14 |SHRINK |15 |

2017-01-15 |GROW |15 |

2017-01-15 |SHRINK |15 |

2017-01-16 |GROW |3 |

2017-01-16 |SHRINK |3 |

2017-01-17 |GROW |6 |

2017-01-17 |SHRINK |6 |

2017-01-18 |GROW |3 |

2017-01-18 |SHRINK |3 |

2017-01-19 |GROW |3 |

2017-01-19 |SHRINK |3 |

Here I have added 4 more redo logs 100MB each and this didn't had any effect...

SELECT GROUP#, bytes/1024/1024, status FROM v$log

GROUP# |BYTES/1024/1024 |STATUS |

-------|----------------|---------|

1 |30 |INACTIVE |

2 |30 |CURRENT |

3 |30 |INACTIVE |

4 |30 |INACTIVE |

5 |100 |INACTIVE |

6 |100 |INACTIVE |

7 |100 |INACTIVE |

8 |100 |INACTIVE |

Thanks for your time and suggestion in advance.

regards

Locked Post

New comments cannot be posted to this locked post.

Locked on Feb 23 2017

Added on Jan 20 2017

#general-database-discussions

15 comments

383 views