Has anyone got any examples of manually setting histogram values? The documentation is a bit light, and all I have been able to find is one example here:
http://www.freelists.org/archives/oracle-l/12-2006/msg00532.html
The demo seems to work, in that you get 3 rows for his test table 'DUMMY' in all_tab_histograms (you can't read the actual endpoint values though, can't see how you do that), but the explain plan says 1 row not 100 (the demo creates 3 buckets and my predicate should fall in the first one if I've understood this right, the whole point being to tell the optimizer that there are 100 rows in that date range).
My attempt (10.2.0.3):
drop table wr_test purge;
create table wr_test(posting_date date);
begin
dbms_stats.gather_table_stats(user,'WR_TEST');
end;
/
DECLARE
r_stats dbms_stats.statrec;
-- Set first three known values explicitly - generate the rest in a loop below:
v_datelist dbms_stats.datearray :=
dbms_stats.datearray(DATE '2008-04-04', DATE '2008-04-05', DATE '2008-04-07');
v_cardinalities dbms_stats.numarray :=
dbms_stats.numarray (808108, 5653479, 1354048);
-- Hack to make loop easier - define a starting value to add months to in the loop:
k_refdate CONSTANT DATE :=
ADD_MONTHS(TRUNC(v_datelist(v_datelist.LAST),'MM'),-v_datelist.COUNT);
BEGIN
r_stats.eavs := 0; -- Don't know what this is, taken from WB example
r_stats.chvals := NULL; -- Don't know what this is, taken from WB example
r_stats.epc := 50; -- Number of histograms to add
-- Generate a bunch of identical entries
-- (could get clever here and increase by 1% per month etc):
FOR i IN v_datelist.COUNT +1 .. r_stats.epc LOOP
v_datelist.EXTEND;
v_cardinalities.EXTEND;
v_datelist(i) := ADD_MONTHS(k_refdate,i);
v_cardinalities(i) := 5000000;
END LOOP;
-- Assign array of cardinalities to the stats record's "bucket values" array attribute:
r_stats.bkvals := v_cardinalities;
DBMS_STATS.PREPARE_COLUMN_VALUES(r_stats,v_datelist);
DBMS_STATS.SET_COLUMN_STATS
( ownname => USER
, tabname => 'WR_TEST'
, colname => 'POSTING_DATE'
, distcnt => 50
, density => 1/50 -- Might have this wrong, seems to come out as 0
, nullcnt => 0
, srec => r_stats );
END;
/
SELECT h.column_name, h.endpoint_value, h.endpoint_actual_value
, c.num_distinct
, c.avg_col_len
, ROUND(c.density,1) AS density
, c.num_nulls, c.nullable, c.last_analyzed
, h.endpoint_number
FROM user_tab_histograms h
, user_tab_columns c
WHERE h.table_name = 'WR_TEST'
AND c.table_name = h.table_name
AND c.column_name = h.column_name
ORDER BY c.column_id, h.endpoint_number;
-- (Shows the 50 new histogram entries)
SQL> select * from wr_test where posting_date = date '2008-06-01'
2
SQL> @xplan
Plan hash value: 2500978562
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 9 | 2 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| WR_TEST | 1 | 9 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("POSTING_DATE"=TO_DATE('2008-06-01 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))
Why 1 row? I thought I just told it there were 50000000?
The background to this is we have a large batch system and a problem getting the stats to keep up with the bulk loads, resulting in cases where the optimizer decides that today's accounting date (for example - there are probably hundreds of different cases) is extremely unlikely to appear in the data, resulting in a disastrous plan. This is due to become a bigger problem as we are due to upgrade from 10.1 to 10.2.0.3, which pays more attention to histogram endpoints than 10.1 did (this was a 10.1 bug fixed in 10.2 from what I can gather).