Skip to Main Content

Oracle Database Discussions

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Setting histograms manually

William RobertsonApr 17 2008 — edited Apr 21 2008

Has anyone got any examples of manually setting histogram values? The documentation is a bit light, and all I have been able to find is one example here:

http://www.freelists.org/archives/oracle-l/12-2006/msg00532.html

The demo seems to work, in that you get 3 rows for his test table 'DUMMY' in all_tab_histograms (you can't read the actual endpoint values though, can't see how you do that), but the explain plan says 1 row not 100 (the demo creates 3 buckets and my predicate should fall in the first one if I've understood this right, the whole point being to tell the optimizer that there are 100 rows in that date range).

My attempt (10.2.0.3):

drop table wr_test purge;
 
create table wr_test(posting_date date);
 
begin
    dbms_stats.gather_table_stats(user,'WR_TEST');
end;
/
 
 
DECLARE
    r_stats         dbms_stats.statrec;

    -- Set first three known values explicitly - generate the rest in a loop below:
    v_datelist      dbms_stats.datearray :=
                    dbms_stats.datearray(DATE '2008-04-04', DATE '2008-04-05', DATE '2008-04-07');
    v_cardinalities dbms_stats.numarray  := 
                    dbms_stats.numarray (808108, 5653479, 1354048);

    -- Hack to make loop easier - define a starting value to add months to in the loop:
    k_refdate       CONSTANT DATE := 
                    ADD_MONTHS(TRUNC(v_datelist(v_datelist.LAST),'MM'),-v_datelist.COUNT);
BEGIN
    r_stats.eavs   := 0;    -- Don't know what this is, taken from WB example
    r_stats.chvals := NULL; -- Don't know what this is, taken from WB example
    r_stats.epc    := 50;   -- Number of histograms to add

    -- Generate a bunch of identical entries
-- (could get clever here and increase by 1% per month etc):
    FOR i IN v_datelist.COUNT +1 .. r_stats.epc LOOP
        v_datelist.EXTEND;
        v_cardinalities.EXTEND;
        v_datelist(i) := ADD_MONTHS(k_refdate,i);
        v_cardinalities(i) := 5000000;
    END LOOP;
 
    -- Assign array of cardinalities to the stats record's "bucket values" array attribute:
    r_stats.bkvals := v_cardinalities;
 
    DBMS_STATS.PREPARE_COLUMN_VALUES(r_stats,v_datelist);
 
    DBMS_STATS.SET_COLUMN_STATS
    ( ownname => USER
    , tabname => 'WR_TEST'
    , colname => 'POSTING_DATE'
    , distcnt => 50
    , density => 1/50  -- Might have this wrong, seems to come out as 0
    , nullcnt => 0
    , srec    => r_stats );
END;
/

SELECT h.column_name, h.endpoint_value, h.endpoint_actual_value
     , c.num_distinct
     , c.avg_col_len
     , ROUND(c.density,1) AS density
     , c.num_nulls, c.nullable, c.last_analyzed
     , h.endpoint_number
FROM   user_tab_histograms h
     , user_tab_columns c
WHERE  h.table_name = 'WR_TEST'
AND    c.table_name = h.table_name
AND    c.column_name = h.column_name
ORDER BY c.column_id, h.endpoint_number;
 
-- (Shows the 50 new histogram entries)
 
SQL> select * from wr_test where posting_date = date '2008-06-01'
  2
SQL> @xplan

Plan hash value: 2500978562
 
-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |     1 |     9 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| WR_TEST |     1 |     9 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
    1 - filter("POSTING_DATE"=TO_DATE('2008-06-01 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))

Why 1 row? I thought I just told it there were 50000000?

The background to this is we have a large batch system and a problem getting the stats to keep up with the bulk loads, resulting in cases where the optimizer decides that today's accounting date (for example - there are probably hundreds of different cases) is extremely unlikely to appear in the data, resulting in a disastrous plan. This is due to become a bigger problem as we are due to upgrade from 10.1 to 10.2.0.3, which pays more attention to histogram endpoints than 10.1 did (this was a 10.1 bug fixed in 10.2 from what I can gather).

Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on May 19 2008
Added on Apr 17 2008
16 comments
1,946 views