Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Partition by combination of several (correlated) date columns.

user3897193Dec 10 2017 — edited Dec 10 2017

Many systems have several date columns which could be used as partitioning base.
E.g. in ordering system there is ORDER_DATE, LOADING_DATE, DESIRED_DELIVERY_DATE, DELIVERY_DATE
or in banking system GENERAL_LEDGER_DATE, TRANSACTION_DATE, INTEREST_DATE.
Typically these dates are correlated and they do not differ a lot. However there are exceptionally cases and
one cannot make assumptions that they do not differ more than a week etc.

Also there are quite a lot applications where rows have BEGIN_DATE, END_DATE combination.
These also reguire some feasible partitioning method.

Typically queries are referencing to recent data. E.g.
where ORDER_DATE >= trunc(sysdate, 'MM').

Current (tested on Oracle 12.1.0.2) Oracle version does support some interesting partitioning methods.
E.g.
DROP TABLE LH_D;
CREATE TABLE LH_D
(
KEY_COLUMN             NUMBER(9)              NOT NULL,
BEGIN_DATE             DATE                   NOT NULL,
END_DATE               DATE,
INSERTED               date           NOT NULL,
UPDATED                date,
VALUE_COLUMN           CHAR(100 CHAR),
MAX_PARTITIONING_DATE date INVISIBLE GENERATED ALWAYS AS (GREATEST("INSERTED",CAST("BEGIN_DATE" AS TIMESTAMP),CAST(COALESCE("END_DATE","BEGIN_DATE") AS TIMESTAMP),COALESCE("UPDATED","INSERTED"))) NOT NULL
)
NOCOMPRESS
TABLESPACE USERS
PCTUSED    0
PCTFREE    10
INITRANS   1
MAXTRANS   255
STORAGE    (
            BUFFER_POOL      DEFAULT
           )
PARTITION BY RANGE (MAX_PARTITIONING_DATE)
INTERVAL( NUMTOYMINTERVAL(1,'MONTH'))
(
PARTITION DEFAULT_PARTITION VALUES LESS THAN (TIMESTAMP' 2010-01-01 00:00:00')
);

declare
TYPE ARRAY IS TABLE OF lh_d%ROWTYPE;
l_data ARRAY;
w_array_size pls_integer := 20000;
cursor c is
select
     level,
     sysdate + level,
     sysdate + level + 50,
     sysdate,
     null,
     'something'
from
                             dual connect by level < 100;
begin
    open c;
    loop
        fetch c bulk collect into l_data limit w_array_size;
        forall i in 1..l_data.count
            insert into lh_d values l_data(i);
        exit when c%notfound;
    end loop;
    close c;
end;
/

ALTER TABLE LH_D ADD (
CONSTRAINT BEGIN_MAX_D
CHECK (begin_date <= max_partitioning_date)
ENABLE VALIDATE,
CONSTRAINT END_MAX_D
CHECK (end_date <= max_partitioning_date)
ENABLE VALIDATE,
-- CONSTRAINT inserted_updated_MAX_D
-- CHECK (inserted <= updated) -- beware of summertime
-- ENABLE VALIDATE,
CONSTRAINT INSERTED_MAX_D
CHECK (inserted <= max_partitioning_date)
ENABLE VALIDATE,
CONSTRAINT UPDATED_MAX_D
CHECK (updated <= max_partitioning_date)
ENABLE VALIDATE);

Select * from LH_D where inserted >= trunc(sysdate) + 50;
select * from table(dbms_xplan.display_cursor);
SQL_ID 0jdnuvdtwuyr7, child number 0
-------------------------------------
Select * from LH_D where inserted >= trunc(sysdate) + 50

Plan hash value: 1161958654

-------------------------------------------------------------------------------------------------
| Id | Operation                | Name | Rows | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |       |       |   211 (100)|          |       |       |
|   1 | PARTITION RANGE ITERATOR|      |     1 |   460 |   211   (1)| 00:00:01 |   KEY |1048575|
|* 2 |   TABLE ACCESS FULL      | LH_D |     1 |   460 |   211   (1)| 00:00:01 |   KEY |1048575|
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("INSERTED">=TRUNC(SYSDATE@!)+50 AND
              GREATEST("INSERTED",CAST(INTERNAL_FUNCTION("BEGIN_DATE") AS
              TIMESTAMP),CAST(COALESCE("END_DATE","BEGIN_DATE") AS
              TIMESTAMP),COALESCE("UPDATED","INSERTED"))>=TRUNC(SYSDATE@!)+50))

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold

select * from LH_D where end_date >= trunc(sysdate) + 50;
select * from table(dbms_xplan.display_cursor);
SQL_ID bh60cr1tdm2yp, child number 0
-------------------------------------
select * from LH_D where end_date >= trunc(sysdate) + 50

Plan hash value: 1161958654

-------------------------------------------------------------------------------------------------
| Id | Operation                | Name | Rows | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |       |       |   211 (100)|          |       |       |
|   1 | PARTITION RANGE ITERATOR|      |     1 |   460 |   211   (1)| 00:00:01 |   KEY |1048575|
|* 2 |   TABLE ACCESS FULL      | LH_D |     1 |   460 |   211   (1)| 00:00:01 |   KEY |1048575|
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("END_DATE">=TRUNC(SYSDATE@!)+50 AND
              GREATEST("INSERTED",CAST(INTERNAL_FUNCTION("BEGIN_DATE") AS
              TIMESTAMP),CAST(COALESCE("END_DATE","BEGIN_DATE") AS
              TIMESTAMP),COALESCE("UPDATED","INSERTED"))>=TRUNC(SYSDATE@!)+50))

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold

So Oracle does partition pruning with two separate columns without any changes to sql-statements.

However there is bug and missing functionalities which do prevent practical utilization of this technique.

Enhancements request is about correcting these bugs and making enhancements to allow partitioning by several (correlated) date and timestamp columns.

Here are listed some currently known problems.

'array insert does not work with constraints and interval partitioning'

drop table lh_d;
CREATE TABLE LH_D
(
KEY_COLUMN             NUMBER(9)              NOT NULL,
BEGIN_DATE             DATE                   NOT NULL,
END_DATE               DATE,
INSERTED               date           NOT NULL,
UPDATED                date,
VALUE_COLUMN           CHAR(100 CHAR),
MAX_PARTITIONING_DATE date INVISIBLE GENERATED ALWAYS AS (GREATEST("INSERTED",CAST("BEGIN_DATE" AS TIMESTAMP),CAST(COALESCE("END_DATE","BEGIN_DATE") AS TIMESTAMP),COALESCE("UPDATED","INSERTED"))) NOT NULL
)
NOCOMPRESS
TABLESPACE USERS
PCTUSED    0
PCTFREE    10
INITRANS   1
MAXTRANS   255
STORAGE    (
            BUFFER_POOL      DEFAULT
           )
PARTITION BY RANGE (MAX_PARTITIONING_DATE)
INTERVAL( NUMTOYMINTERVAL(1,'MONTH'))
(
PARTITION DEFAULT_PARTITION VALUES LESS THAN (TIMESTAMP' 2010-01-01 00:00:00')
);

ALTER TABLE LH_D ADD (
CONSTRAINT BEGIN_MAX_D
CHECK (begin_date <= max_partitioning_date)
ENABLE VALIDATE,
CONSTRAINT END_MAX_D
CHECK (end_date <= max_partitioning_date)
ENABLE VALIDATE,
-- CONSTRAINT inserted_updated_MAX_D
-- CHECK (inserted <= updated) -- beware of summertime...
-- ENABLE VALIDATE,
CONSTRAINT INSERTED_MAX_D
CHECK (inserted <= max_partitioning_date)
ENABLE VALIDATE,
CONSTRAINT UPDATED_MAX_D
CHECK (updated <= max_partitioning_date)
ENABLE VALIDATE);

ORA-02290: check constraint (END_MAX_D) violated
ORA-06512: at line 19

This partitioning technique would be usefull in datawarehouse systems where interval partitioning and array inserts would be used.

One could however choose not to use constraints and add criterias from constraints into where clauses but then
changing existing systems whould require sql-statement changes.

'Partition pruning is not occuring where parameters are used.'
declare
cursor c1(p_date date) is
select /*+ as_parameter*/ * from LH_D where inserted >= p_date;

w_date date := trunc(sysdate) + 50;
w_row c1%rowtype;
begin
     open c1(w_date);
     loop
         fetch c1 into w_row;
         exit when c1%notfound;
     end loop;
end;
/

select sql_id, sql_text from v$sql where sql_text like '%/*+ as_parameter*/%';
select * from table(dbms_xplan.display_cursor('152vpkutv2fss'));
SQL_ID 152vpkutv2fss, child number 0
-------------------------------------
SELECT /*+ as_parameter*/ * FROM LH_D WHERE INSERTED >= :B1

Plan hash value: 2756421308

--------------------------------------------------------------------------------------------
| Id | Operation           | Name | Rows | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |    43 (100)|          |       |       |
|   1 | PARTITION RANGE ALL|      |     1 |   451 |    43   (0)| 00:00:01 |     1 |1048575|
|* 2 |   TABLE ACCESS FULL | LH_D |     1 |   451 |    43   (0)| 00:00:01 |     1 |1048575|
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("INSERTED">=:B1)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold

'partition pruning is not working when mixed date and timestamp columns are involved'
drop table lh_ts;
CREATE TABLE LH_TS
(
KEY_COLUMN             NUMBER(9)              NOT NULL,
BEGIN_DATE             DATE                   NOT NULL,
END_DATE               DATE,
INSERTED               TIMESTAMP(6)           NOT NULL,
UPDATED                TIMESTAMP(6),
VALUE_COLUMN           CHAR(100 CHAR),
MAX_PARTITIONING_DATE TIMESTAMP(6) INVISIBLE GENERATED ALWAYS AS (GREATEST("INSERTED",CAST("BEGIN_DATE" AS TIMESTAMP),CAST(COALESCE("END_DATE","BEGIN_DATE") AS TIMESTAMP),COALESCE("UPDATED","INSERTED"))) NOT NULL
)
NOCOMPRESS
TABLESPACE USERS
PCTUSED    0
PCTFREE    10
INITRANS   1
MAXTRANS   255
STORAGE    (
            BUFFER_POOL      DEFAULT
           )
PARTITION BY RANGE (MAX_PARTITIONING_DATE)
INTERVAL( NUMTOYMINTERVAL(1,'MONTH'))
(
PARTITION DEFAULT_PARTITION VALUES LESS THAN (TIMESTAMP' 2010-01-01 00:00:00')
)
enable row movement;

declare
TYPE ARRAY IS TABLE OF lh_ts%ROWTYPE;
l_data ARRAY;
w_array_size pls_integer := 20000;
cursor c is
select
     level,
     sysdate + level,
     sysdate + level + 50,
     sysdate,
     null,
     'something'
from
                             dual connect by level < 100;
begin
    open c;
    loop
        fetch c bulk collect into l_data limit w_array_size;
        forall i in 1..l_data.count
            insert into lh_ts values l_data(i);
        exit when c%notfound;
    end loop;
    close c;
end;
/

ALTER TABLE LH_ts ADD (
CONSTRAINT BEGIN_MAX_ts
CHECK (begin_date <= max_partitioning_date)
ENABLE VALIDATE,
CONSTRAINT END_MAX_ts
CHECK (end_date <= max_partitioning_date)
ENABLE VALIDATE,
-- CONSTRAINT inserted_updated_MAX_ts
-- CHECK (inserted <= updated) -- beware of summertime...
-- ENABLE VALIDATE,
CONSTRAINT INSERTED_MAX_ts
CHECK (inserted <= max_partitioning_date)
ENABLE VALIDATE,
CONSTRAINT UPDATED_MAX_ts
CHECK (updated <= max_partitioning_date)
ENABLE VALIDATE);

select * from LH_ts where end_date >= trunc(sysdate) + 50;
select * from table(dbms_xplan.display_cursor);
-------------------------------------
SQL_ID gccbddhubqapn, child number 0
-------------------------------------
select * from LH_ts where end_date >= trunc(sysdate) + 50

Plan hash value: 4152623163

---------------------------------------------------------------------------------------------
| Id | Operation           | Name | Rows | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |       |       |   211 (100)|          |       |       |
|   1 | PARTITION RANGE ALL|       |     1 |   459 |   211   (1)| 00:00:01 |     1 |1048575|
|* 2 |   TABLE ACCESS FULL | LH_TS |     1 |   459 |   211   (1)| 00:00:01 |     1 |1048575|
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("END_DATE">=TRUNC(SYSDATE@!)+50)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold

One could arque that use of this method is preventing effective way to fetch data from some specific period from history.
This is so, if period can be defined with partitioning column. However
- vast majority of queries do concentrate on recent data
- this enables use of several different data columns
- if oracle in sometimes would enable subpartitioning by interval ranges, then creating subpartitions by MIN_PARTITIONING_DATE, would effectively help to this.

Added on Dec 10 2017

#active, #database-ideas, #database-ideas-ideas

3 comments

652 views