Thread: system statistics not being used


Permlink Replies: 28 - Pages: 2 [ 1 2 | Next ] - Last Post: May 6, 2008 12:43 AM Last Post By: Jonathan Lewis
sivakumar3

Posts: 56
Registered: 09/17/99
system statistics not being used
Posted: May 2, 2008 1:52 PM
Click to report abuse...   Click to reply to this thread Reply
I collected fresh system statistics for a new database load. I see its not being used by the database even though aux_stats shows stats.
sivakumar3

Posts: 56
Registered: 09/17/99
Re: system statistics not being used
Posted: May 2, 2008 1:53 PM   in response to: sivakumar3 in response to: sivakumar3
Click to report abuse...   Click to reply to this thread Reply
cpu in mhz : 1043
single block readtime in ms : 7.457
multiblock readtime in ms : 19.566
average multiblock readcount: 45
burleson

Posts: 2,343
Registered: 05/06/98
Re: system statistics not being used
Posted: May 3, 2008 3:53 AM   in response to: sivakumar3 in response to: sivakumar3
Click to report abuse...   Click to reply to this thread Reply
Hi Sasiva,

I see its not being used by the database

How can you tell? What are you looking at?

Do you have a big difference in I/O times for scattered and sequential reads?

http://www.dba-oracle.com/t_dbms_stats_gather_system_stats.htm

Prior to Oracle 10g, adjusting powerful optimizer parameters (i.e. optimizer_index_cost_adj) was the only way to compensate for sample size issues with dbms_stats and system stats issues. But as of Oracle 10g, improvements in system statistics collection using dbms_stats.gather_system_stats (to measure sequential vs. scattered disk I/O speed) plus improved sampling within dbms_stats had made adjustments to the optimizer parameters a "worst practice" exercise in most cases. Ceteris Parabus, always adjust CBO statistics before adjusting optimizer parms.

Sometimes, system stats are not enough, and in rare cases, it is still necessary to adjust optimizer_index_cost_adj, even in 10g:

http://www.dba-oracle.com/oracle11g/oracle_11g_dbms_stats_enhancements.htm



Hope this helps. . .

Donald K. Burleson
Oracle Press author
Author of "Oracle Tuning: The Definitive Reference":
http://www.dba-oracle.com/bp/s_oracle_tuning_book.htm
sivakumar3

Posts: 56
Registered: 09/17/99
Re: system statistics not being used
Posted: May 3, 2008 6:18 AM   in response to: burleson in response to: burleson
Click to report abuse...   Click to reply to this thread Reply
Hi Donald,

Thank you for your response. I am not seeing scattered or sequential reads but heavy load on the Database server with runq at 100%. I gathered system stats two times on May 01 and May 02. On may 01, load on the database came down drastically after I gathered system stats but on the may 02, it not the same case.

Wait event is Latch: Cache buffer chains.

Hope this answers your question.

Regards,
Siva Devulapalli
burleson

Posts: 2,343
Registered: 05/06/98
Re: system statistics not being used
Posted: May 3, 2008 6:29 AM   in response to: sivakumar3 in response to: sivakumar3
Click to report abuse...   Click to reply to this thread Reply
Hi Siva,

Database server with runq at 100%

The runqueue is an integer, not a percent! Are you using vmstat?

http://www.dba-oracle.com/t_tuning_cpu_usage_vmstat.htm

BTW, it's normal to see CPU at 100%, it's designed that way:

http://www.dba-oracle.com/t_high_cpu.htm

Is CPU a bottleneck? If so, have you looked at _optimizer_cost_model='cpu'?

http://www.dba-oracle.com/oracle_tips_optimizer_cost_model.htm

**********************************************************

On may 01, load on the database came down drastically after I gathered system stats but on the may 02, it not the same case.

Ah good clue! Are you running STATSPACK or AWR? If so, you can analyze the historical data and see exactly what changed:

http://www.dba-oracle.com/art_Statspack_Trend.htm

*****************************************************************

Wait event is Latch: Cache buffer chains.

http://www.dba-oracle.com/t_high_cache_buffer_chain_waits_contention.htm

The "cache buffer chain" latch wait is normal, but high values are associated with high simultaneous buffer access, similar to a freelist shortage on an index or table segment header.

MetaLink also suggests using the _db_block_hash_buckets and
_db_block_hash_latches undocumented parameters have been suggested as a remedy, but always check with MetaLink before using any undocumented parameters. To see the "cache buffer chain" waits:

select
count(*) child_count,
sum(gets) sum_gets,
sum(misses) sum_misses,
sum(sleeps) sum_sleeps
from
v$latch_children
where
name = 'cache buffers chains';

*************************************************************

Can you get a STATSPACK report and paste it into Statspackanalyzer at http://www.statspackanalyzer.com ?

That should help greatly . . . .



Hope this helps. . .

Donald K. Burleson
Oracle Press author
Author of "Oracle Tuning: The Definitive Reference":
http://www.dba-oracle.com/bp/s_oracle_tuning_book.htm
sivakumar3

Posts: 56
Registered: 09/17/99
Re: system statistics not being used
Posted: May 3, 2008 6:45 AM   in response to: burleson in response to: burleson
Click to report abuse...   Click to reply to this thread Reply
1. No. I did not use vmstat.
2. optimizer cost model is not touched
3. AWR. Only change is appliation started using the database
4. I opened a SR for Oracle advice on
db_block_hash_buckets and
_db_block_hash_latches
5. Values from latch children
65536 1.4401E+11 3127583862 569219878
6. From AWR
1. Host CPU was a bottleneck and the instance was consuming 86% of the host CPU.
All wait times will be inflated by wait for CPU.
2.Contention on buffer cache latches was consuming significant database time.

Hope this helps.
Charles Hooper

Posts: 754
Registered: 01/27/08
Re: system statistics not being used
Posted: May 3, 2008 7:28 AM   in response to: burleson in response to: burleson
Click to report abuse...   Click to reply to this thread Reply
Hi Siva,

Database server with runq at 100%

BTW, it's normal to see CPU at 100%, it's designed
that way:

http://www.dba-oracle.com/t_high_cpu.htm


Hello Mr. Burleson,

I have seen a couple articles and books that suggest CPU utilization at 100% is not normal. Some of the articles suggest that when CPU utilization exceeds 90%, odd things such as excessive latching, excessive time spend in the LOG FILE SYNC wait event, Oracle processes volunteering to yield their CPU time slice and then having to wait a significant number of CPU cycles to be run on the CPU again, and various other problems which may arise. If I remember correctly, the book "Forecasting Oracle Performance" suggests that the tipping point may be closer to 75% CPU utilization.

If saisiva is seeing excessive CPU utilization, it might be a good idea to examine the number of logical IOs and determine whether or not a large number of logical IOs is a sign that SQL statements need to be tuned. Saisiva may also want to determine if excessive parsing (even excessive soft parsing) is a problem and whether or not session_cached_cursors is set to an appropriate value. Examination of recursive CPU time, and the actual recursive calls may also be helpful.

Saisiva, you might find that even though a couple rows are retrieved by the query to aux_stats$, that does not mean that system statistics have been collected. System statistics should be collected when the system is under a normal to heavy (typical) load to avoid the collection of inappropriate statistics that then adversely affect cost based plan calculations.
You might want to take a look at this article, or check the Oracle documentation:
http://www.oracle.com/technology/pub/articles/lewis_cbo.html

Charles Hooper
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc.
sivakumar3

Posts: 56
Registered: 09/17/99
Re: system statistics not being used
Posted: May 3, 2008 8:00 AM   in response to: Charles Hooper in response to: Charles Hooper
Click to report abuse...   Click to reply to this thread Reply
System stats were collected when the database is at its peak load
user599375

Posts: 468
Registered: 10/09/07
Re: system statistics not being used
Posted: May 3, 2008 12:01 PM   in response to: sivakumar3 in response to: sivakumar3
Click to report abuse...   Click to reply to this thread Reply
I gathered system
stats two times on May 01 and May 02. On may 01, load
on the database came down drastically after I
gathered system stats but on the may 02, it not the
same case.

Did you store the stats for the two runs?

If so, did you compare them and try reverting back to the 'good' May 01?

If not, this is a good reason to start storing them.
Jonathan Lewis

Posts: 1,781
Registered: 01/23/07
Re: system statistics not being used
Posted: May 3, 2008 1:49 PM   in response to: sivakumar3 in response to: sivakumar3
Click to report abuse...   Click to reply to this thread Reply
Sivakumar,

You haven't answered Burleson's question - how do you know that you are not using the system statistics ? The fact that you don't like the performance isn't proof, the fact that you are running at 100% CPU isn't proof.

All you have to do is something like:

explain plan for
select * from some_table;

select * from table(dbms_xplan.display);

You can check two things: if your execution plan includes two columns about (%CPU) and Time, then you are using system statistics, e.g:


Cost (%CPU) Time


2 (0) 00:00:01
2 (0) 00:00:01



If you are not using system statistics, then you won't get these two columns, and the execution plan will be followed by the comment:
Note

- cpu costing is off (consider enabling it)

Ignore comments to the contrary - on a multi-user system, running constantly at 100% CPU is a bad thing. If your bottleneck is CPU then any wait problems (such as latch waits) get worse because a waiting session probably has to spend more time waiting to get to the top of the run queue - which is the point made by the ADDM report you've quoted.

This means your next task is to identify some CPU intensive tasks and try to make them more CPU efficient. Perhaps the best starting point is to run the AWR report for the interval that gave you that ADDM report, and look at the top two or three SQL stataments in the section titled "SQL ordered by CPU", and see if there execution plan is reasonable. Any SQL reported in that section of the report will also have it's execution plan(s) captured, and you can list them by running the awrsqrpt.sql script in your $ORACLE_HOME/rdbms/admin directory.

Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
http://www.jlcomp.demon.co.uk

Post Script: I suppose I could have mentioned that there's an introductory article about CPU costing / system statistics that I wrote a few years ago here: http://www.oracle.com/technology/pub/articles/lewis_cbo.html - it might have some relevance.

Message was edited by: Jonathan Lewis, 6th May 2008
Missing "not" added, "comments" changed to "columns"

Jonathan Lewis

Posts: 1,781
Registered: 01/23/07
Re: system statistics not being used
Posted: May 3, 2008 2:08 PM   in response to: burleson in response to: burleson
Click to report abuse...   Click to reply to this thread Reply
Hi Siva,

BTW, it's normal to see CPU at 100%, it's designed that way:

http://www.dba-oracle.com/t_high_cpu.htm


This is the third time you've made this claim in the last few weeks. But it's not "normal" to drive CPUs to 100%. Except for extremely exotic circumstances (and that excludes database processing) it means you've overloading the system and wasting resources.

Your comments in that article about the _optimizer_cost_model paramter are also incorrect. The parameter is not there for databases that are CPU bound. See http://forums.oracle.com/forums/message.jspa?messageID=2152457#2152457

Jonathan Lewis
http://jonathanlewis.wordpress.com
http://www.jlcomp.demon.co.uk

Message was edited by: Jonathan Lewis
to insert missing word

Jonathan Lewis

Posts: 1,781
Registered: 01/23/07
Re: system statistics not being used
Posted: May 3, 2008 2:14 PM   in response to: burleson in response to: burleson
Click to report abuse...   Click to reply to this thread Reply

MetaLink also suggests using the
_db_block_hash_buckets and
_db_block_hash_latches undocumented parameters have
been suggested as a remedy, but always check with
MetaLink before using any undocumented parameters.

Have you got a metalink document ID for that note. Apart from benchmark specials it's a suggestion that should have been forgotten when 8i was released

Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
http://www.jlcomp.demon.co.uk
burleson

Posts: 2,343
Registered: 05/06/98
Re: system statistics not being used
Posted: May 3, 2008 2:29 PM   in response to: Jonathan Lewis in response to: Jonathan Lewis
Click to report abuse...   Click to reply to this thread Reply
Oh brother . . . .

it means you've overloading the system and wasting resources.

No, it means that you are using the system.

All SMP architectures are designed to throttle-up the CPU quickly, and a 100% utilization DOES NOT mean an overload. It's straight from Algorithms 101 . . . .

************************************************************

Your comments in that article about the _optimizer_cost_model paramter are also incorrect.

No, you are wrong again:

http://www.dbforums.com/archive/index.php/t-1306175.html

"We experimented with the following other parameters so far:

db_writer_processes (set to 4 in combination with cpu_count = 4)
fast_start_parallel_rollback = high
_optimizer_cost_model = cpu

Although the last parameter is undocumented it has brought the
greatest improvement so far: dropping >2 min execution time down to 40
seconds (therefore still much higher than 10 sec with cpu_count = 1)

Sven Bombach"

Jonathan Lewis

Posts: 1,781
Registered: 01/23/07
Re: system statistics not being used
Posted: May 3, 2008 2:57 PM   in response to: burleson in response to: burleson
Click to report abuse...   Click to reply to this thread Reply

All SMP architectures are designed to throttle-up the
CPU quickly, and a 100% utilization DOES NOT mean an
overload. It's straight from Algorithms 101 . . . .

You may wish to believe that, but please don't tell anyone else.

Consider the simple case of 8 queries running on 8 CPUs. They will be competing for the same cache buffers chains latches - which means that seven processes could be spinning on the same latch while the eighth is holding it. None of the processes ever need wait, but most of them could be wasting CPU most of the time.


Your comments in that article about the _optimizer_cost_model paramter are also incorrect.

No, you are wrong again:
http://www.dbforums.com/archive/index.php/t-1306175.html


That thread also says:
"unfortunately our trial and error approach to find the proper settings (we searched Oracle Tech Net (OTN) and google groups without success) did not help so far."
So they didn't know what they were doing, or why it might help.


"We experimented with the following other parameters so far:
db_writer_processes (set to 4 in combination with cpu_count = 4)

Read Kevin Closson's blog to find out why that's not such a great idea.

fast_start_parallel_rollback = high
And that was to solve their problem with a pl/sql procedure not running fast enough, apparently.

_optimizer_cost_model = cpu

Although the last parameter is undocumented it has brought the
greatest improvement so far: dropping >2 min execution time down to 40
seconds (therefore still much higher than 10 sec with > cpu_count = 1)

They were using 9i so the default value for this parameter would have been 'choose' - which means CPU costing would have been used if they had collected system statistics.

Since changing this parameter to 'cpu' apparently made a difference we can infer that they didn't have any system stats set - which means no CPU information gathered, which means the optimizer couldn't have been doing anything to address any issue of the database being "CPU-bound".

But, apart from anything else - where is there any comment in that thread about how much CPU they were using before and after the change - all they mention is elapsed time.

Jonathan Lewis
http://jonathanlewis.wordpress.com
http://www.jlcomp.demon.co.uk
burleson

Posts: 2,343
Registered: 05/06/98
Re: system statistics not being used
Posted: May 4, 2008 5:51 AM   in response to: Jonathan Lewis in response to: Jonathan Lewis
Click to report abuse...   Click to reply to this thread Reply
They will be competing for the same cache buffers chains latches - which means that seven processes could be spinning on the same latch while the eighth is holding it.

Whoa! I'm talking about server-side CPU consumption, outside Oracle.

Just to make sure that you are not operating under "assumptions" here, I'm talking about server-side CPU consumption, on an SMP server running lots of concurrent tasks. The references to 100% CPU are as they display in standard OS monitors like lparstat, watch, sar and vmstat.

Also, don't assume that all OS tasks have the same dispatching priority. In a server-side 100% CPU situation, some tasks may have enqueues, while other do not. That's what "nice" is for.

You may wish to believe that, but please don't tell anyone else.

We ARE NOT talking about an opinion here . . . . .

- CPU metrics are expressed as percentages. Hence, all of the CPU values (us+sy+id+wa) will always sum to 100.

- In the UNIX/LINUX environment, the runqueue is used to display the number of active tasks that are currently waiting for CPU resources.

- That's the measure of a CPU bottleneck: where runqueue > cpu_count.

**********************************************************************

This from IBM, who built their server and the OS, the final word:

http://www.ibm.com/developerworks/aix/library/au-aixoptimization/index.html

"In a system that is CPU-bound, all the processors are 100 percent busy and some jobs might be waiting for CPU time in the run queue."

"Fixing one bottleneck might actually cause a CPU bottleneck, because your system is now allowing the CPU to perform to its optimum capacity and it might not have the capacity to handle the increased amount of resources given to it. "

*****************************************************

Here is a reference from Harvard, a good school, I'm told:

http://maltman.hmdc.harvard.edu/unix/systune.pdf

"CPU User+Sys activity near 100%, no CPU idle, no CPU wait on I/O,(sar -u) low context switch (pswch in sar -p) activity .

Check the run-queue (sar -q) to see how many processes are waiting to run, a high run-queue can indicate a CPU bottleneck, but you can have a CPU without a high run-queue (if only one large process is running.) If CPU Sysactivity is high, suspect inefficient use of system calls, or borderline memory or i/o bottlenecks"

all they mention is elapsed time.

That's all that counts to most tuning experts . . . . . . RESPONSE TIME!
Legend
Guru Guru : 2500 - 1000000 pts
Expert Expert : 1000 - 2499 pts
Pro Pro : 500 - 999 pts
Journeyman Journeyman : 200 - 499 pts
Newbie Newbie : 0 - 199 pts
Oracle ACE Director
Oracle ACE Member
Oracle Employee ACE
Helpful Answer (5 pts)
Correct Answer (10 pts)

Point your RSS reader here for a feed of the latest messages in all forums