Thread: log_buffer


Permlink Replies: 38 - Pages: 3 [ 1 2 3 | Next ] - Last Post: Jul 18, 2007 1:56 AM Last Post By: adeledda
user511621

Posts: 410
Registered: 05/24/06
log_buffer
Posted: Jul 3, 2007 7:39 AM
Click to report abuse...   Click to reply to this thread Reply
hi

in 9i we can modify shared pool size and db cache size

but can we alter log_buffer parameter... anyway???
The Human Fly

Posts: 4,750
Registered: 11/24/00
Re: log_buffer
Posted: Jul 3, 2007 7:53 AM   in response to: user511621 in response to: user511621
Click to report abuse...   Click to reply to this thread Reply
You can't modify the log_buffer parameter online unlike the other two you mentioned. Any change on this required immediate shutdown and startup.

If you are using spfile, do the following:

alter system set log_buffer=<size> scope=spfile;
shutdown immediate
startup

If you are using pfile:

Change the value of log_buffer in the pfile.
shutdown immediate
startup

Jaffar

burleson

Posts: 2,343
Registered: 05/06/98
Re: log_buffer
Posted: Jul 3, 2007 11:51 AM   in response to: user511621 in response to: user511621
Click to report abuse...   Click to reply to this thread Reply
Hi,

Also, note the rules for log_buffer sizing:

http://www.dba-oracle.com/t_log_buffer_optimal_size.htm
Jonathan Lewis

Posts: 1,781
Registered: 01/23/07
Re: log_buffer
Posted: Jul 7, 2007 10:09 AM   in response to: burleson in response to: burleson
Click to report abuse...   Click to reply to this thread Reply
Burleson,

I note that you've posted a URL to that article on your website twice in the last few days. May I suggest that you don't publish the URL in this forum again until you have read and corrected the article.

In paragraph one, you claim:

Oracle documentation recommends that the maximum size for the log_buffer parameter be either 500k, or (128K * cpu_count) whichever is greater.

You are simply restating the formula for the default value for this parameter as given in the 8.1 reference manual. From the 8.1.6. Database Reference, under parameter log_buffer:

Default value: Operating system specific. Maximum: 500K or 128K * CPU_COUNT, whichever is greater

The fact that Oracle has a default does not make that default the "maximum recommended".

In the same paragraph you also state

The number of CPUs is important to the value of log_buffer, because multiple log writer (LGWR) processes may be spawned by Oracle to asynchronously offload the redo information.

This is incorrect, there is only ever one LGWR process. Here's some output from a 9.2.0.8 instance, started on a machine with 64 CPUs, with log_parallelism set to 4, and no setting for log_buffer:
select  name, value
from v$parameter
where name in ('cpu_count','log_parallelism','log_buffer')
;
NAME            VALUE

---------------
cpu_count 64
log_buffer 8388608
log_parallelism 4
select name, description 
from V$bgprocess
where paddr !='00'
;
NAME            DESCRIPTION

------------------------
PMON process cleanup
DBW0 db writer process 0
DBW1 db writer process 1
DBW2 db writer process 2
DBW3 db writer process 3
DBW4 db writer process 4
DBW5 db writer process 5
DBW6 db writer process 6
DBW7 db writer process 7
ARC0 Archival Process 0
ARC1 Archival Process 1
LGWR Redo etc.
CKPT checkpoint
SMON System Monitor Process
RECO distributed recovery
Note - only one log writer, despite invoking explicit parallelism in logging, and despite the significant number of CPUs.

In paragraph 4 you state: "Even though Oracle does not recommend a log_buffer greater than one meg, ...". This is despite the fact that the previous two paragraphs reference and quote a Metalink Document that (a) points out that 10MB is a reasonable value for the log_buffer (... for Oracle Applications 11i) and (b) explains why this is a reasonable suggestion.

Paragraph 5 says: If you have waits associated to log_buffer size “db log sync wait”, try increasing log_buffer to a value over 1 megabyte.. But there is no such wait; the closest is 'log file sync', and waits for this event may indicate that your log buffer is too big. The wait that suggests you need to increase the log buffer is "log buffer space".

A few lines further on you state that it is a 'common mistake' that: "The log_buffer is not a multiple of the db_block_size. In Oracle9i with multiple block sizes, the log_buffer should be a multiple of 2048 bytes." This is incorrect - the log buffer has to be a multiple of the redo log block size - typically 512 bytes, though some O/S may be configured to 1KB or 2KB unit sizes: and the redo block size follows the unit I/O size of the O/S. Here's some output from a 9i database using multiple (data)block sizes. The value for the log_buffer size here is not a multiple of 2KB
SQL> show parameters log_buffer
NAME                                 TYPE        VALUE

-----------
log_buffer integer 524800

You have a section of an AWR report (from a 10.1 database, by the look of it) which you say shows a database with an undersized log buffer where the dba did not set the log_buffer parameter. The figures quoted are as follows:

                                                                   Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn

------------
----------
--------
log file sequential read 4,275 0 229 54 0.0
log buffer space 12 0 3 235 0.0
 
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time

------------
--------
CPU time 163,182 88.23
db file sequential read 1,541,854 8,551 4.62
log file sync 1,824,469 8,402 4.54
log file parallel write 1,810,628 2,413 1.30
SQL*Net more data to client 15,421,202 687 .37

The time lost on log buffer space waits is 3 seconds in a total of 12 waits. Compare this with 1.8Million log file parallel write waits, and log file sync waits. Note - A single transaction (prior to 10.2) has to create a minimum of two redo records, which means a minimum of two log buffer space allocations - so we're seeing 12 waits out of 3.6M allocations.

The most remarkable thing about this database is that the CPU usage is such a high fraction of the total database time - perhaps that's the result of very poor SQL being protected by throwing excess memory at the problem. If you want to make this database go faster don't worry about the log buffer, find out where the CPU is going.

While talking about sample AWR reports - let's take a look at the stats in the other 'log buffer size' article that you reference in this article. The second article states:

Today, most large database use a log buffer between 5 meg to 10 meg. Here is an example of an Oracle 10g database ed: 10.1 again with an undersized log buffer, in this example 512k ed: which means 1 to 4 CPUs:
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) DB Time Wait Class

------------
---------
log file parallel write 9,670 291 55.67 System I/O
log file sync 9,293 278 53.12 Commit
CPU time 225 43.12
db file parallel write 4,922 201 38.53 System I/O
control file parallel write 1,282 65 12.42 System I/O
But there are no log buffer space waits in this list at all. There are log file parallel write waits, and log file sync waits - but they aren't a symptom of an undersized log buffer.

If you want to know the problem with this database, work out the average wait times - the log writes are 30 milliseconds, the db file writes are 40 milliseconds. the control file writes are 50 milliseconds.

If you have a performance problem with this system it's probably in the slow I/O subsystem - not in the log buffer. (It's odd that we see no time lost on reads - but perhaps this is another system where a lot of memory has been thrown at the db cache to hide the real problem).


Coming back to the original article, you have a section on "log buffer related parameters". You claim that one of these is the transactions_per_rollback_segment. Apart from the fact that this is irrelevant from Oracle 9i onwards if you are using automatic undo, it's irrelevant anyway because it has nothing to do with redo ... "rollback" is undo.

In passing, if you re-read the article you've quoted you will find that that the comments it makes about the maximum value for this parameter were in relation to an 8.1.6 database using 2KB blocks. That's not particularly helpful for some one asking about a 9i database in the year 2007.

The final, and valuable, bit of your article is the bit you have copied from Steve Adams' website, along with the associated URL.

Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
http://www.jlcomp.demon.co.uk

The Human Fly

Posts: 4,750
Registered: 11/24/00
Re: log_buffer
Posted: Jul 7, 2007 3:40 PM   in response to: Jonathan Lewis in response to: Jonathan Lewis
Click to report abuse...   Click to reply to this thread Reply
The number of CPUs is important to the value of log_buffer, because multiple log writer (LGWR) processes may be spawned by Oracle to asynchronously offload the redo information.

“db log sync wait”, try increasing log_buffer to a value over 1 megabyte..

There are really blunders.

I am sure people (who ask questions here) seeks the quality of answers to prevent production issues or overcome from their production database agony, and being an experts, one should really double check the contents before replies publicly.

Jaffar
burleson

Posts: 2,343
Registered: 05/06/98
Re: log_buffer
Posted: Jul 8, 2007 7:15 AM   in response to: The Human Fly in response to: The Human Fly
Click to report abuse...   Click to reply to this thread Reply
Hi Jaffar,

There are really blunders.

Yes, I mis-typed “db log sync wait”, please forgive me!

As to the multiple log writers, that's from Metalink, not me:

Metalink note 109582.1 says that log I/O slave processes started way-back in Oracle8, and that multiple LGWR processes will only appear as DML activity increases:

“Starting with Oracle8, I/O slaves are provided.  These slaves can perform asynchronous I/O even if the underlying OS does not support Asynchronous I/O. These slaves can be deployed by DBWR, LGWR, ARCH and the process doing Backup. . . 

In Oracle8i, the DBWR_IO_SLAVES parameter determines the number of IO slaves for LGWR and ARCH. . .

As there may not be substantial log writing taking place, only one LGWR IO slave has been started initially.  This may change when the activity increases.”
burleson

Posts: 2,343
Registered: 05/06/98
Re: log_buffer
Posted: Jul 8, 2007 7:20 AM   in response to: Jonathan Lewis in response to: Jonathan Lewis
Click to report abuse...   Click to reply to this thread Reply
Lewis,

May I suggest that you don't publish the URL in this forum again until you have read and corrected the article.

May I suggest that you stop publishing “incomplete” proofs?

In this case, you offer-up a questionable proof, never mentioning the all-important _lgwr_io_slaves parameter, nor the Metalink statement that multiple LGWR processes only appear as activity increases.

Jonathan, your “incomplete” test “proofs” can be deceptive, especially to beginners, and I’ve documented just a few of them here:

http://www.dba-oracle.com/t_biased_test_cases.htm

The final, and valuable, bit of your article is the bit you have copied from Steve Adams' website, along with the associated URL

No Jonathan, it’s the Metalink references that are authoritative, not Steve (although he has great info, it’s getting a tad out-of-date).

Oh Jonathan, I like your snarky word-trick, saying the “bit that you have copied” instead of “citing a reference”, like everyone else does.

Lewis, you have “copied” loads of content onto your domain from my web sites, but, unlike me, you don’t bother documenting them with hyperlinks.

Why is that?

**************************************

PLEASE NOTE THAT IN THIS RESPONSE I CITE ONLY JONATHAN LEWIS AND METALINK.

If anybody wants to hurl gratuitous insults, direct them at the cited sources, not me!

OK, let’s start with your most glaring inconsistency.

MYTH? Experts can suggest the root cause of a performance problem with only a short description and the top-5 timed events
************************************************

Lewis claims to be able to diagnose a problem from one of my clients, using only the data from the top-5 timed events: He stated:

“If you have a performance problem with this system it's probably in the slow I/O subsystem - not in the log buffer.”

Unless Lewis is the charter member of the “Oracle Psychic Friends Network, it’s irresponsible to make such a bold assertion with such little data. (in the instance example, his “guess” is completely incorrect, but that’s not surprising, given the tiny amount of data available to him).

But what’s confusion is the fact that Lewis admits that it’s “not sensible” to draw conclusions using only the top-5 timed events (unless, of course, it’s to try to insult me!). But don’t take my word for it, see Lewis’ own words here:

http://www.jlcomp.demon.co.uk/statspack_01.html

“Even though the top-5 timed events numbers look unusual, the first thing you should notice is that I haven’t supplied enough information for you to make a sensible decision. .

As this example shows, not only is it possible for the top (or even top 2) of the Top 5 report to be totally misleading; sometimes you may even have to think about that’s implied by the summary, but not in the summary. Be careful when you use Statspack – it’s very easy to look at the headline numbers and be totally fooled.”

OK, let’s continue with your contradictions to Metalink about multiple LGWR processes.

MYTH? Multiple LGWR processes
*************************************************

Lewis claims: “Note - only one log writer, despite invoking explicit parallelism in logging, and despite the significant number of CPUs.”

Here we go again, another incomplete proof that contradicts both the Oracle documentation and Metalink. You failed to mention the hidden parameter _lgwr_io_slaves and the Metalink note that clearly states that multiple LGWR processes will only appear under high activity. The Oracle docs are very clear on this:

“Prior to Oracle8i you could configure multiple log writers using the LGWR_IO_SLAVES parameter.”

In Oracle10g it becomes a hidden parameter (_lgwr_io_slaves). Metalink note 109582.1 says:

https://metalink.oracle.com/metalink/plsql/f?p=130:14:1634364682833660660::::p14_database_id,p14_docid,p14_show_header,p14_show_help,p14_black_frame,p14_font:NOT,109582.1,1,1,1,helvetica

“Starting with Oracle8, I/O slaves are provided. These slaves can perform asynchronous I/O even if the underlying OS does not support Asynchronous I/O. These slaves can be deployed by DBWR, LGWR, ARCH and the process doing Backup. . .

In Oracle8i, the DBWR_IO_SLAVES parameter determines the number of IO slaves for LGWR and ARCH. . .

As there may not be substantial log writing taking place, only one LGWR IO slave has been started initially. This may change when the activity increases.”

Lewis, is Metalink wrong? If so, please clear-up this confusion!

MYTH?: There is no relationship between log_buffer size and CPU count.
*********************************************************************************

Lewis wrote: “You are simply restating the formula for the default value for this parameter as given in the 8.1 reference manual.”

The obsolete Oracle8 docs note that the value for the parameter log_simultaneous_copies is dependent on the number of CPU’s on the server:

“On multiple-CPU computers, multiple redo copy latches allow multiple processes to copy entries to the redo log buffer concurrently. The default value of LOG_SIMULTANEOUS_COPIES is the number of CPUs available to your Oracle instance”

Starting in Oracle8i, it’s a hidden parameter (_log_simultaneous_copies). From Metalink note 147471.1, we see that the default is set to cpu_count * 2.

Also, Metalink note 147471.1 “Tuning the Redo log Buffer Cache and Resolving Redo Latch Contention”, notes that multiple redo allocation latches become possible by setting the parm _log_parallelism, and that the log buffer is split in multiple LOG_PARALLELISM areas that each have a size of init.ora LOG_BUFFER. Further, it shows the relationship to the number of CPU’s:

https://metalink.oracle.com/metalink/plsql/f?p=130:14:1634364682833660660::::p14_database_id,p14_docid,p14_show_header,p14_show_help,p14_black_frame,p14_font:NOT,147471.1,1,1,1,helvetica

“The number of redo allocation latches is determined by init.ora LOG_PARALLELISM. The redo allocation latch allocates space in the log buffer cache for each transaction entry. If transactions are small, or if there is only one CPU on the server, then the redo allocation latch also copies the transaction data into the log buffer cache.”

We also see that log file parallel writes are related to the number of CPU’s. Metalink note 34583.1 “WAITEVENT: "log file parallel write" Reference Note”, shows that the log_buffer size is related to parallel writes (i.e. the number of CPU’s), and discusses how LGWR must wait until all parallel writes are complete. It notes that solutions to high “log file parallel write” waits are directly related to I/O speed, recommending that redo log members be on high-speed disk, and that redo logs be segregated onto

“on disks with little/no IO activity from other sources.
(including low activity from other sources against the same disk controller)”.

This is a strong argument for using super-fast solid-state disk.

MYTH? – The log_buffer should remain small.
***********************************************************

This is perpetuated with several Metalink notes that have become somewhat obsolete:

“In a busy system, a value 65536 or higher is reasonable for log_buffer.

“It has been noted previously that values larger than 5M may not make a difference.”

Metalink notes that in 10gr2, we see a case where a customer cannot reduce the log_buffer size from 16 meg:

https://metalink.oracle.com/metalink/plsql/f?p=130:15:1634364682833660660::::p15_database_id,p15_docid,p15_show_header,p15_show_help,p15_black_frame,p15_font:BUG,4930608,1,1,1,helvetica

“In 10G R2, Oracle combines fixed SGA area and redo buffer log buffer together. If there is a free space after Oracle puts the combined buffers into a granule, that space is added to the redo buffer. Thus you see redo buffer has more space as expected. This is an expected behavior.. .

In 10.2 the log buffer is rounded up to use the rest of the granule. The granule size can be found from the hidden parameter "_ksmg_granule_size" and in your case is probably 16Mb. The calculation for the granule size is a little convoluted but it depends on the number of datafiles”

MYTH? – Disk I/O speed is not the primary concern for tuning the log_buffer
**************************************************************************************
Below, Jonathan Lewis confirms that disk I/O speed is critical to redo throughput:

http://www.jlcomp.demon.co.uk/not_proof.html

http://www.jlcomp.demon.co.uk/ops_01.html

“Your main worry is how fast the I/O subsystem might be, and what happens with the log buffer and log files.”

“Each instance has only one redo log buffer, and the ability to generate redo can ultimately be what defines the maximum speed of your database.”

TO THOSE WHO LIKE TO HURL INSULTS, PLEASE NOTE THAT I HAVE CONTRIBUTED NOTHING PERSONAL, CITING ONLY JONATHAN LEWIS, THE ORACLE DOCUMENTATION AND METALINK!

The Human Fly

Posts: 4,750
Registered: 11/24/00
Re: log_buffer
Posted: Jul 8, 2007 7:51 AM   in response to: burleson in response to: burleson
Click to report abuse...   Click to reply to this thread Reply
Yes, I mis-typed “db log sync wait”, please forgive me!
As to the multiple log writers, that's from Metalink, not me:
Burleson, I didn't mean that.

Many of DBA, including myself, here expect great and quality of information from the experts.

However, thanks for your clarification.
burleson

Posts: 2,343
Registered: 05/06/98
Re: log_buffer
Posted: Jul 8, 2007 8:23 AM   in response to: The Human Fly in response to: The Human Fly
Click to report abuse...   Click to reply to this thread Reply
Hi Jaffar,

Many of DBA, including myself, here expect great and quality of information from the experts.

That's great, but the core issue is that Oracle technology changes so fast that what was true yesterday will likely be un-true tomorrow!

For example, correct information from ancient OTN content comes-up on Google, which was perfectly fine when it was published, but it's now incorrect.

You cannot nuke it, because there are still Oracle shops using Oracle7, where it's good advice!

How can anyone offer-up advise that's of perfect quality, when there are so many "running" releases . . .

It's a formidable challenge for anyone, expert or not . . .
Jonathan Lewis

Posts: 1,781
Registered: 01/23/07
Re: log_buffer
Posted: Jul 8, 2007 9:41 AM   in response to: burleson in response to: burleson
Click to report abuse...   Click to reply to this thread Reply
Burleson,.

Was "db log sync wait" a mis-type for "log buffer space wait" or "log file sync wait" ? There's a significant difference in name, cause, and treatment.

Your quote from Metalink 109582.1 in response to The Human Fly says: "multiple LGWR processes will only appear as DML activity increases".

The text you seem to be paraphrasing appears in the Metalink note as: "As there may not be substantial log writing taking place, only one LGWR IO slave has been started initially. This may change when the activity increases."

Note that that's "IO Slaves" - not "processes", there's a very big difference.

Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
http://www.jlcomp.demon.co.uk

Jonathan Lewis

Posts: 1,781
Registered: 01/23/07
Re: log_buffer
Posted: Jul 8, 2007 10:52 AM   in response to: burleson in response to: burleson
Click to report abuse...   Click to reply to this thread Reply
Burleson - extracts from your posting are in italic script:

"Never mentioning the all-important parameter _lgwr_io_slaves nor the Meatlink note ..."

I don't mention that parameter because (a) it's hidden, and has been hidden since 8i (long before 10g as you seem to suggest later in your note) and (b) even when you set it to a non-zero value you still don't get multiple LGWR processes.

The Metalink note doesn't say that you get multiple LGWR processes, it says: "Starting with Oracle8, I/O slaves are provided. These slaves can perform asynchronous I/O even if the underlying OS does not support Asynchronous I/O. These slaves can be deployed by DBWR, LGWR, ARCH and the process doing Backup"

So the I/O slaves are emulating async I/O, they are not assuming the functions of LGWR (or DBWR etc.). You still get just one process responsible for destaging the log buffer and reporting log write completion back to the user sessions; but now LGWR is handing off to I/O slaves and waiting for them to complete rather than handing off to underlying disk system - there's a very important distinction there. Moreover, it is probably fairly common knowledge that the order of preference for configuring the options for database writer mechanisms is:
* async I/O at machine level
* multiple db writers
* IO slaves
which probably means there aren't many systems using IO slaves. And, just for the record, you can't mix multiple db writers with IO slaves.

By the way, your article says: "The number of CPUs is important to the value of log_buffer, because multiple log writer (LGWR) processes may be spawned by Oracle to asynchronously offload the redo information."

How does that use of "because" work? Paraphrasing the Metalink note: if you set dbwr_io_slaves to a non-zero value, then _lgwr_io_slave defaults to 4, which allows up to 4 extra I/O slaves to be started for LGWR to hand off its i/o load. That statement has no logical connection to the fact that the number of CPUs affects the value of log_buffer.

"I like your snarky word trick"
You mean the trick of making a true statement - but expressing it in a way that might make people think it means more than it does ? I learned that one from someone on the Internet - I can't think who it was right now, but I'm sure it will come back to me.

"The Metalink references"
So what do you do when you find two Metalink references that contradict each other ?

you have "copied" loads of content onto your domain from my web sites
What's the difference between "copied" with and without quotes ? I've certainly copied a few bits from your website from time to time - but only to point out the mistakes. I didn't think the volume had got up to "loads" yet. Since you've clearly been tracking the volume better than I have, give me a list and I'll add the appropriate URL - and include an apology for failing to include it in the first place.

"it’s irresponsible to make such a bold assertion with such little data. (in the instance example, his “guess” is completely incorrect, but that’s not surprising, given the tiny amount of data available to him)."

But you presented the data with the lead in: "Here is an an example of an Oracle 10g database with an undersized log buffer, in this example 512k:" in one case and "Here is a AWR report showing a database with an undersized log_buffer, in this case where the DBA did not set the log_buffer parameter in their init.ora file:" in the other.
Apparently you thought the data you published was sufficient to demonstrate to the rest of the world the symptoms of a log buffer that was too small. Are you now saying that the data wasn't intended to show people what the symptoms of a small log buffer would show up as in their 'Top 5 timed events' ? If so, why print the data at all ? If they're not truely symptomatic of anything why not introduce them with a comment like: "Here is a completely irrelevant set of figures from a database which (I promise you, despite apparent indications to the contrary) has a log buffer which is too small ?"

But what’s confusing is the fact that Lewis admits that it’s “not sensible” to draw conclusions using only the top-5 timed events (unless, of course, it’s to try to insult me!). But don’t take my word for it, see Lewis’ own words here:

http://www.jlcomp.demon.co.uk/statspack_01.html

“Even though the top-5 timed events numbers look unusual, the first thing you should notice is that I haven’t supplied enough information for you to make a sensible decision. .


The error in the argument here is known as "arguing from the particular to the general". For example, if I say: "It's a lovely sunny day here and the sky is blue", you would note be wise to report this as "Lewis says it never rains in the UK".

By the way, in one case you are ignoring my use of the word "probably": “If you have a performance problem with this system it's probably in the slow I/O subsystem - not in the log buffer.”. It's almost as if you're trying to put together a few true statements in a way that is intended to lead to the worst possible interpretation. I can argue the case that slow writes by LGWR can lead to log buffer space waits - not to mention log file syncs - and you have log writes of 30 milliseconds. It's a pity that you only included the log buffer space waits in one of the data sets and didn't publish some "before and after" figures for the two cases, ; I'd be interested to see what improvements in I/O times and reduction in use of CPU you got from increasing the log buffer size

"Here we go again, another incomplete proof that contradicts..."
You're repeating yourself - you made this claim further up the page. It's the same
counter-argument to the same demonstration. Repeating something doesn't make it true.

"MYTH?: There is no relationship between log_buffer size and CPU count."
It took me a moment to understand why you wanted to include this as a heading when I had quoted the Oracle manual for the default value Operating system specific. Maximum: 500K or 128K * CPU_COUNT, whichever is greater

But of course you're trying to avoid the discussion of multiple LGWR processes by employing a logical fallacy. Let me demonstrate this by analogy. Your published statement starts. "The number of CPU is important to the log_buffer size, because ..." compare this with the following statement: "The pope is a catholic because it always rains in the UK."
* The pope is a catholic - true
* It always rains in the UK - not true
* The pope is a catholic because it always rains in the UK - not true.
At this point in your posting, you are trying to prove that you must be in the right by arguing about the pope and ignoring the weather.

"The obsolete Oracle8 docs note that the value for the parameter log_simultaneous_copies is dependent on the number of CPU’s on the server"

An attempt to direct attention away from the basic criticism

Why does the value of log_simultaneous_copies have any relevance to the argument, the discussion is about multiple LGWR processes, that parameter is about the number of redo allocation latches used to allow processes to acquire space in the log buffer.

"Metalink note 34583.1 “WAITEVENT: "log file parallel write" Reference Note”, shows that the log_buffer size is related to parallel writes (i.e. the number of CPU’s)"

I don't see anything in that note about the log buffer size, or the number of CPUs. Moreover, the behaviour of "log file parallel writes" does not depend on the number of CPUs, it depends on the capacity for some form of asynchronous I/O. And what has "super-fast solid-state disk" got to do with how many LGWR processes exist ? Again this is just obfuscation to direct attention from the basic questions.

"MYTH? – The log_buffer should remain small."
Another attempt to direct attention away from the basic criticism. The default size for the ... no, wait, I've already said that - it's still true - I don't need to say it again. But note that the default size for the log buffer isn't "small".

"This is perpetuated with several Metalink notes that have become somewhat obsolete"
But you said that the Metalink references are authoritative - so why are you
allowed to decided which ones are really authoritative and which ones are obsolete ?

"Quoting from Metalink In 10.2 the log buffer is rounded up to use the rest of the granule. The granule size can be found from the hidden parameter "_ksmg_granule_size" and in your case is probably 16Mb"
I know that Metalink is "authoritative" when it suits you, but the author of this one doesn't seem to know (or, perhaps, doesn't want to say) much about In Memory Undo, private redo threads, and the possibility of multiple shared redo threads. There's a lot more to it than just the granule size. But, as I said in my original note: That's not particularly helpful for some one asking about a 9i database. And what's it got to do with the basic criticisms anyway ?

"MYTH? – Disk I/O speed is not the primary concern for tuning the log_buffer"
Yet another attempt to direct attention away from the basic criticism. But what a strange creation. (Does it make you think, by the way, that my comment about the slow I/Os from one of the AWR reports is on the mark ?)

And finally:
Don't you have anything to say about my comments on the bit you copied (I said "quoted" last time - so I thought I'd take the opportunity to say "copied" this time) from Osamu Kobayashi about transactions_per_rollback_segment. And what about my demonstration that the log buffer does not have to be a multiple of 2048 if you are using multiple block sizes in 9i ?

So let's summarize:
* Oracle doesn't recommend a maximum of 500KB or 128KB * CPU - that's just the default
* Oracle has recommended fairly large log buffers in the past - but even the 10g reference manual makes an inane comment about "In a busy system, a value 65536 or higher is reasonable."
* You don't get multiple LGWR processes
* You can get IO slave processes to assist LGWR, but only if you have configured your system to run with dbwr_io_slaves, or fiddle with a hidden parameter.
* transaction_per_rollback_segment is irrelevant as far as redo is concerned.
* the log buffer prior to 10.2 had to be a multiple of the redo block size, which was based on the O/S unit I/O size.
* Metalink is not "authoritative" - and if it contradicts itself it's a good idea to test its claims; and even when it hasn't contradicted itself it may be incomplete or wrong.

Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
http://www.jlcomp.demon.co.uk

Footnote: In the past, other members of the Oracle community have taken my technical criticisms of material written by Don Burleson as an excuse to pass rude, personal, and non-technical remarks about him. Accurate technical criticism is something to be encouraged. Abuse should not be tolerated.

Jonathan Lewis

Posts: 1,781
Registered: 01/23/07
Re: log_buffer
Posted: Jul 8, 2007 1:20 PM   in response to: burleson in response to: burleson
Click to report abuse...   Click to reply to this thread Reply

How can anyone offer-up advise that's of perfect
quality, when there are so many "running" releases .

Burleson,

Datestamp everything you write.
Try to mention the version you are talking about
Don't tell people to read it until you have glanced through it to see that it is appropriate and likely to be relevaent - especially if the datestamp is a couple of years old or the version is incompatible.

Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
http://www.jlcomp.demon.co.uk
burleson

Posts: 2,343
Registered: 05/06/98
Re: log_buffer
Posted: Jul 8, 2007 1:47 PM   in response to: Jonathan Lewis in response to: Jonathan Lewis
Click to report abuse...   Click to reply to this thread Reply
Hi Jonathan,

Accurate technical criticism is something to be encouraged. Abuse should not be tolerated.

Thanks, Dude, I appreciate that . . . .

Datestamp everything you write.

Good idea, I need to find a way to automate that! But it still does not completely address the issue of people running older releases . . .

Try to mention the version you are talking about

Yes, that a good idea. But what about explaining concepts to raw beginners? Nothing confuses a beginner like exception-after-exception, and it often clouds the general concepts!

Don't tell people to read it until you have glanced through it to see that it is appropriate and likely to be relevaent - especially if the datestamp is a couple of years old or the version is incompatible.

Ah, therein lies the rub! I don't know about the UK, but I still see people using Oracle7 in the USA (they say "If it ain't broke, don't fix it", but I suspect that the real reason is cheap shops trying to save on maintenance costs).

It's only now that many shops here are FINALLY upgrading from Oracle8i. . . .

Metalink is not authoritative

Well, yes, but Oracle Corporation may not agree, and I'd have a whale of a time trying to convince an IT manager who spends $40k/year on metal-level support.

Is there a way to petition to improve and clarify their content?

Also, I want to commend you on the super-job you did on your STATSPACK case study (STATSPACK distractions), showing that you often have to look-past the top-5 timed events. It would be great to see more real-world case studies from you, great stuff:

http://jonathanlewis.wordpress.com/statspack-distractions/

As to your big response, please give me 24 hours to digest it.

Message was edited by:
burleson
burleson

Posts: 2,343
Registered: 05/06/98
Re: log_buffer
Posted: Jul 9, 2007 4:08 AM   in response to: Jonathan Lewis in response to: Jonathan Lewis
Click to report abuse...   Click to reply to this thread Reply

OK, let’s start
with your Metalink assertion:


The Metalink note doesn't say that you get multiple LGWR processes

OK, educate
me.  If they are not multiple LGW(n) processes, then how do the

“I/O slaves for LGWR” appear in a "ps -ef" command?



Metalink note 109582.1 (dated 2002), says: “These slave processes are
allocated whenever the foreground process(DBWR, LGWR) request them. The slaves
for ARCH and Backup exit automatically after an idle time of 60 seconds. I/O
slaves for DBWR and LGWR do not exit on their own.”

Also, the Oracle
documentation says: ““Prior to Oracle8i you could configure multiple log writers
using the LGWR_IO_SLAVES parameter.”


if you set dbwr_io_slaves to a non-zero value, then _lgwr_io_slave defaults
to 4, which allows up to 4 extra I/O slaves to be started for LGWR to hand off
its i/o load. That statement has no logical connection to the fact that the
number of CPUs affects the value of log_buffer.

Isn’t it the same
reason that OPQ performance is related to cpu_count?  Any multiple processes
will benefit from SMP servers . . .

>> And, just for
the record, you can't mix multiple db writers with IO slaves

Can you elaborate
on this, please?

>> I learned
that one from someone on the Internet - I can't think who it was right now, but
I'm sure it will come back to me.


Touché!  . . .

>> and include
an apology for failing to include it in the first place.


Thanks, but will you use the nofollow tag?  Cmon, we both know why you did not
hyperlink to my quotes.  An apology is for when you make a mistake, and this was
no mistake!  You chose not to link to my quotes because you don’t want the
search engines to increase my search rank.


Apparently you thought the data you published was sufficient to demonstrate
to the rest of the world the symptoms of a log buffer that was too small.

Stop putting words
in my mouth.  I said no such thing.  It’s “apparent” only that I shared my
real-world observations from databases where increasing the log_buffer improved
performance, nothing more.


Are you now saying that the data wasn't intended to show people what the
symptoms of a small log buffer would show up as in their 'Top 5 timed events' ?

Not at all.  I’m an
empiricist.  When I see a clear-cut case of a too-small log_buffer, I try to
publish the symptoms.  Are they conclusive?  No.  I never said that they were. 
But that does not mean that empirical observations are totally irrelevant,
either!

>> So what do
you do when you find two Metalink references that contradict each other?



For a “real” database problem, I log an SR and get a definitive answer from one
of the Oracle software engineers who wrote the code.  Why guess?  Why doodle
with test scripts when you can talk to the folks who wrote the source code?


The error in the argument here is known as "arguing from the particular to
the general".

Ah, but isn’t that
want you attempted in your test case?  It only proved that multiple LGWR
processes didn’t exist for that particular release, for that particular
configuration, under that specific user load.  It does not generalize to the
universe of possible scenarios, right?

 


By the way, in one case you are ignoring my use of the word "probably":
“If you have a performance problem with this system it's probably in the slow
I/O subsystem - not in the log buffer.”
.

No, I only point it
out because it’s the same type of “speculation” that you have sharply criticized
others for making.  As I’ve watched your knowledge of tuning mature, you are
starting to rely on your “intuition”, which is a good thing.  Like you, I’ve
tuned over a thousand databases, and while no human can remember the exact
nature of every problem, we get “hunches”, unverifiable feelings, which are
usually right-on.  Human intuition manifests itself as “I’ve just got a feeling
that this is what’s going on”, but it’s not really just a felling.  In
actuality, it’s a long-forgotten system buried in your subconscious that you
cannot fully remember.  Speculating is a sign or a true expert, and it will only
get better as you get more experience!


But of course you're trying to avoid the discussion of multiple LGWR
processes by employing a logical fallacy.

No, let’s discuss
it!  Prove to me that Metalink is wrong.  The problem is that test cases are
wonderful for proving that something is true, but not so hot at proving that
something does not exist!


Moreover, the behaviour sic of "log file parallel writes" does not depend
on the number of CPUs, it depends on the capacity for some form of
asynchronous I/O.

Which, in turn, is
facilitated by multiple processors.  You could make the exact same argument
about OPQ . . .


no, wait, I've already said that - it's still true - I don't need to say it
again.

There is nothing
wrong with repeating yourself for emphasis, sometimes it helps . . .


I know that Metalink is "authoritative" when it suits you, but the author of
this one doesn't seem to know (or, perhaps, doesn't want to say) much about In
Memory Undo.

OK, let’s address
the Metalink credibility issue.  Traditionally, Metalink is the last-word on
Oracle, often written by the software engineers who wrote the source code. 
Metalink says what patches we must apply, what configurations are unsupported,
and what we may, or may not do.

Does Metalink have
uniform quality?  No.  But it’s a whole lot easier to learn how Oracle works
from the people who wrote it, than speculating with test cases, IMHO.


And what about my demonstration that the log buffer does not have to be a
multiple of 2048 if you are using multiple block sizes in 9i?

Do you feel
comfortable generalizing that your “demonstration” applies to all Oracle
databases?

SUMMARY

************

>> * Oracle
doesn't
recommend a maximum of 500KB or 128KB * CPU - that's just the
default
.

Yes, but the
default values are often set according to rules of optimal performance.


>> * Oracle has recommended fairly large log buffers in the past - but
even the 10g reference manual makes an inane comment about "In a busy system, a
value 65536 or higher is reasonable."

Yes, and that
contradicts the Oracle Apps recommendation of 10 meg.


>> * You don't get multiple LGWR processes

This strikes me as
a tad nitpicky.  You admit that LGWR has factotums, and whether no not these
slave processes appear as multiple LGWR processes was never my point.  Educate
me, Jonathan.  The people who wrote Oracle say “As there may not be substantial
log writing taking place, only one LGWR IO slave has been started initially. 
This may change when the activity increases.”.  In a non-thread OS model (Linux,
UNIX), where would I see these processes in a “ps –ef|grep ORA|grep –v grep”? 
If they are not called LGW(n), then what are the LGWR slaved called?


>> * You can get IO slave processes to assist LGWR, but only if you have
configured your system to run with dbwr_io_slaves, or fiddle with a hidden
parameter.

Agreed.


>> * transaction_per_rollback_segment is irrelevant as far as redo is
concerned.

You mean
transactions_per_rollback_segment, right? Agreed.


>> * the log buffer prior to 10.2 had to be a multiple of the redo block
size, which was based on the O/S unit I/O size.

Agreed.  Whatever,
that’s a nit.


>> * Metalink is not "authoritative" - and if it contradicts itself it's a
good idea to test its claims; and even when it hasn't contradicted itself it may
be incomplete or wrong

I must strongly
disagree.  Failure to follow the instructions on Metalink can leave a whole shop
unsupported, and despite their errors and contradictions, Metalink is the final
word, directly from the software engineers who wrote the source code.  That’s
way better than trying to figure-out Oracle’s behavior with a zillion test
cases, IMHO.

For example, to
“prove” that multiple LGWR processes don’t exist you would have to run thousands
of test cases, on all platforms and configurations, and, since Oracle says that
multiple LGWR processes only appear with load, you would have a formidable
challenge conclusively proving this assertion.

Let me add one more
assertion:

* After the Oracle
software and disk I/O sub-system are optimized, there are very limited options
for improving throughout:

- Use faster disks
(
SSD)

- Turn-off
ARCHIVELOG mode (using triple mirrored disks and hardware-level incremental
backups)

 

burleson

Posts: 2,343
Registered: 05/06/98
Re: log_buffer
Posted: Jul 9, 2007 4:20 AM   in response to: Jonathan Lewis in response to: Jonathan Lewis
Click to report abuse...   Click to reply to this thread Reply

Regarding the multiple log writer processes, I got this, from Steve Karam
(OCM):

According to Metalink Note 109582.1, you will see multiple log writer slaves as
ixnn processes. The note is for Oracle 8, but it’s a good description,
confirming that multiple log writer processes exist.

The representation of I/O Slaves is as follows: ora_ixnn_SID where

i= slave,
x= adaptor number and
nn is the slave number within the adaptor.

For example:

ora_i105_mul is a background I/O slave process.

Here:

i stands for a slave
1 stands for the adaptor number for this slave
05 is the slave number within the adaptor.

The valid ranges are:

x (adaptor number) -> 1 to 9 and A through F
nn (Slave number) -> 01..0z, 10..zz (Once slave number 0z is spawned, next
slaves are 10,11 and so on.)

An adaptor is a pool of memory allocated to a thread and the pool is allocated

a handle to be identified. The adaptor number is the handle returned by the
internal Oracle code.

There are different classes of adaptors and the pools by these classes are
totally exclusive to each other. For example, the IO pool for data buffers and

log buffers are different and hence these have separate adaptor numbers.

(Note: It is NOT necessary for the same process to use the same adaptor number
always. What this implies is that the adaptor numbers are not 'reserved' for
processes.)

Some of the classes for which adaptors exist are:

DBWR
LGWR
ARCH
Backup_tape_io
backup_disk_Io

...


For the other adaptors in 8i:
-----------------------------

These start on demand.

Thus a typical Unix ps -ef listing after instance startup may look like :

oracle 27582 1 0.0 11:16:40 ?? 0:44.03
ora_pmon_mul
oracle 27584 1 0.0 11:16:40 ?? 0:28.44 ora_ckpt_mul
oracle 27586 1 0.0 11:16:40 ?? 0:58.65 ora_dbw0_mul
oracle 27588 1 0.0 11:16:40 ?? 0:05.12 ora_lgwr_mul
oracle 27593 1 0.0 11:16:41 ?? 0:32.35 ora_smon_mul
oracle 27594 1 0.0 11:16:41 ?? 0:11.59 ora_arc0_mul
oracle 27596 1 0.0 11:16:41 ?? 0:00.35 ora_reco_mul
oracle 27598 1 0.0 11:16:48 ?? 0:17.42 ora_i201_mul
oracle 28408 1 0.0 11:35:25 ?? 0:06.47 ora_i102_mul
oracle 28411 1 0.0 11:35:25 ?? 0:06.59 ora_i101_mul
oracle 28412 1 0.0 11:35:25 ?? 0:06.10 ora_i103_mul
oracle 28414 1 0.0 11:35:25 ?? 0:06.03 ora_i104_mul
oracle 28416 1 0.0 11:35:25 ?? 0:06.14 ora_i105_mul
oracle 28420 1 0.0 11:35:25 ?? 0:06.01 ora_i106_mul


Here DBWR_IO_SLAVES=6.

The processes ora_i101 to ora_i106 are DBWR IO slaves and ora_i201
is the I/O slave for LGWR.

Note:

1. As there may not be substantial log writing taking place, only one LGWR IO
slave has been started initially. This may change when the activity increases.

2. The DBWR IO slaves may also not be immediately started with the instance, as
is clear from the timestamps of the processes.
 

Legend
Guru Guru : 2500 - 1000000 pts
Expert Expert : 1000 - 2499 pts
Pro Pro : 500 - 999 pts
Journeyman Journeyman : 200 - 499 pts
Newbie Newbie : 0 - 199 pts
Oracle ACE Director
Oracle ACE Member
Oracle Employee ACE
Helpful Answer (5 pts)
Correct Answer (10 pts)

Point your RSS reader here for a feed of the latest messages in all forums