Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Histogram cardinality and density

unknown-879931May 5 2014 — edited May 6 2014

Hello Experts,

Histograms, cardinality and density these terms are involved with eachother and playing significant role on Oracle performance. After I read an article written by Tom Kyte (Ask Tom: On External Table Queries, Data Consistency, and Nothing), I became interested in histograms and cardinality. Because, according to his article, changing null values to abandoned extreme values (for columns which have high balance histograms) misguide the optimizer. After, I read interesting paper HISTOGRAMS - MYTHS AND FACTS (http://www.centrexcc.com/Histograms%20-%20Myths%20and%20Facts.pdf) by Wolfgang Breitling, I encountered much more interesting things related to histograms. In his paper, under the part of HISTOGRAMS AND “DENSITY”, it is written the following formula for height-balanced histograms.

With a height-balanced histogram density = Σ cnt2 / ( num_rows * Σ cnt )

(num_rows means number of rows with not null values of the histogram column)

Unfortunately, even I read that chapter several times, I cannot understand the formula clearly. Then, I read a paper called JOIN CARDINALITY ESTIMATION WITH HISTOGRAMS EXPLAINED by Alberto Dell'Era (http://www.adellera.it/investigations/join_over_histograms/JoinCardinalityEstimationWithHistogramsExplained.pdf). He also mentioned the same topic.

for the curious, letting no (v) equal to the number of occurrences of value v in the not-popular subtable, density for HBs is:

sum ( no2(v) ) / [ sum ( no(v) ) * num_rows ], over all v

Note that it gives more "weight" to the values that occur more frequently because of the squaring.

Note: in 11.1.0.6, density is ignored for filter predicates and overridden by a new "NewDensity" computed internally,

that for HBs is (num_rows_np / num_rows) * (1 / num_distinct_np), which is definitely aligned with intuition

(num_rows*NewDensity = num_rows_np / num_distinct_np), and a strong hint that the square-based formula above is

now considered for replacement. The "NewDensity" is used for the second and third contributor as well.

I also don't understand what does no2(v) means.

After all, I read series of articles about histogram by Jonathan Lewis (Histograms Part 2 – All Things Oracle). Somehow, I cound't encounter information about density calculation of high-balanced histogram. But, I found the following.

As far as cardinality calculations go, Oracle “counts buckets” for the popular values – in our tiny example we picked every 4th row, our “bucket” size is 4. We know that 12 appears twice in the full histogram, so Oracle calculates the cardinality of the predicate “n1 = 12” as 8 (i.e. 2 buckets * 4 rows per bucket). The method the optimizer uses to calculate the cardinality for the values which have not been recognized as popular has varied over time – in 11.2.0.3 one of the calculations is simply:  (number of non-popular rows)/(number of non-popular values) which works as follows in our example

Popular values 1 (12)
Non-popular values 7 (5,6,9,11,13,16,17)
Popular rows 8 (2 buckets at 4 rows per bucket)
Non-popular rows 12 (20 – 8)

Non-popular cardinality = 12/7 = 1.714, which rounds up to 2.

Jonathan Lewis uses different calculation method for un-popular values in high-balanced histograms. However, Alberto Dell'Era uses the same formula for both popular and un-popular values.

num_rows * EP difference / max (EP)

The writers of articles are very important and special people for me. Their comments and thoughts are so important for me.

I got confuse after I read all of them. I am just wondering do I mix everything? Can someone throw some light on this subject? How to calculate High-balanced histogram density? and cardinality for popular and un-popular values? What do you recommend?

Regards

Charlie

Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Jun 3 2014
Added on May 5 2014
3 comments
887 views