Skip to Main Content

Oracle Database Discussions

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Let's find the HWM!

user9540031Mar 17 2024 — edited Mar 17 2024

In a recent thread from the SQL & PL/SQL forum category, came into the discussion the following method of finding the position of the HWM (*) in a segment:

Run the DBMS_SPACE.UNUSED_SPACE procedure on the segment, then compute the HWM according to the following formula (using the values returned from that procedure):

HWM = total_blocks - unused_blocks - 1

(*) For readers new to the Oracle database, who would not have heard of the high-water mark yet: please see this page in the Concepts manual. Also note that, in segments in an ASSM tablespace, an additional subtlety is that, actually there exist both a HWM, and a (related) low-HWM.

In this post I will attempt to demonstrate that, in what is arguably the most common configuration—namely: 8-kb blocks, ASSM tablespace, locally-managed, with AUTOALLOCATE extent management—the above formula is wrong.

(Remark: with apologies for writing yet another post about this already hugely discussed topic.)

0/ Test platform

Oracle 19.9 Enterprise Edition (Oracle Linux 7.9)

The test tablespace is a small-file, locally-managed tablespace; segment management is ASSM; extent management is AUTOALLOCATE; block size is 8 kbytes.

1/ Test setup

alter session set current_schema = "SCOTT";

drop table invoices_partitioned purge;

create table invoices_partitioned (
    invoice_no      number         not null,
    invoice_date    date           not null,
    comments        varchar2(500)
)
segment creation immediate
storage (initial 64m next 64m minextents 5)
partition by range (invoice_date) (
    partition invoices_past values less than (date '2023-01-01'),
    partition invoices_23q1 values less than (date '2023-04-01'),
    partition invoices_23q2 values less than (date '2023-07-01'),
    partition invoices_23q3 values less than (date '2023-10-01'),
    partition invoices_23q4 values less than (date '2024-01-01'),
    partition invoices_24q1 values less than (date '2024-04-01'),
    partition invoices_24q2 values less than (date '2024-07-01')
);

Notes:

  • I have purposely used the segment creation immediate clause, so that partition segments are created upfront.
  • And the storage clause was purposely crafted so that initially we'll get 5 extents, each of 64 Mb (= 8192 blocks) in every partition.

Further on, we'll use only the “INVOICES_23Q3” partition—I realize that I could have made this test simpler, using a plain, non-partitioned table; but I built from the example that came in the original discussion, so I'll keep using that anyway.

2/ Queries

2.a/ Extents in the test partition

column owner          format a8
column segment_name   format a22
column partition_name format a15

select 
    * 
from 
    sys.dbms_parallel_execute_extents
where 
    owner = 'SCOTT' 
    and segment_name = 'INVOICES_PARTITIONED'
    and partition_name = 'INVOICES_23Q3';

Readout, just after creating the table (the partition is empty at that stage):

OWNER    SEGMENT_NAME           PARTITION_NAME  SEGMENT_TYPE       DATA_OBJECT_ID RELATIVE_FNO   BLOCK_ID     BLOCKS
-------- ---------------------- --------------- ------------------ -------------- ------------ ---------- ----------
SCOTT    INVOICES_PARTITIONED   INVOICES_23Q3   TABLE PARTITION             30617           11    3252480       8192
SCOTT    INVOICES_PARTITIONED   INVOICES_23Q3   TABLE PARTITION             30617           11    3260672       8192
SCOTT    INVOICES_PARTITIONED   INVOICES_23Q3   TABLE PARTITION             30617           11    3268864       8192
SCOTT    INVOICES_PARTITIONED   INVOICES_23Q3   TABLE PARTITION             30617           11    3277056       8192
SCOTT    INVOICES_PARTITIONED   INVOICES_23Q3   TABLE PARTITION             30617           11    3285248       8192

2.b/ Statistics of the test partition

We gather statistics on the test partition as follows:

begin
    dbms_stats.gather_table_stats(
        ownname  => 'SCOTT',
        tabname  => 'INVOICES_PARTITIONED',
        partname => 'INVOICES_23Q3'
    );
end;
/

And we query statistics as follows:

column table_name     format a22
column partition_name format a15

select
    table_name, 
    partition_name,
    num_rows,
    blocks
from
    dba_tab_statistics
where
    owner = 'SCOTT'
    and table_name = 'INVOICES_PARTITIONED'
    and partition_name = 'INVOICES_23Q3';

Readout (just after creating the table; the test partition is empty):

TABLE_NAME             PARTITION_NAME    NUM_ROWS     BLOCKS
---------------------- --------------- ---------- ----------
INVOICES_PARTITIONED   INVOICES_23Q3            0          0

2.c/ Segment header information

(+ path of the data file—we'll use that if/when we dump blocks from the segment.)

column file_name format a40

select 
    seg.segment_subtype,
    seg.header_file,
    seg.header_block,
    dbf.file_name
from 
    dba_segments seg,
    dba_data_files dbf
where 
    seg.owner = 'SCOTT' 
    and seg.segment_name = 'INVOICES_PARTITIONED'
    and seg.partition_name = 'INVOICES_23Q3'
    and dbf.file_id = seg.header_file;

Readout:

SEGMENT_SU HEADER_FILE HEADER_BLOCK FILE_NAME                               
---------- ----------- ------------ ----------------------------------------
ASSM                11      3252513 /u02/oradata/CDB2/pdb_rva/users01.dbf   

Remark: according to DBA_SEGMENTS, the segment header block is located on file 11, block #3,252,513. Yet according to DBA_EXTENTS (above), the extent containing the segment header block starts on file 11, block #3,252,480—that's exactly 33 blocks lower.

2.d/ HWM (questionable) computation

The following PL/SQL block runs the DBMS_SPACE.UNUSED_SPACE procedure, then computes the (possible) position of the HWM according to the ( total_blocks - unused_blocks - 1 ) formula:

set serveroutput on
declare
    v_total_blocks              number;
    v_total_bytes               number;
    v_unused_blocks             number;
    v_unused_bytes              number;
    v_last_used_extent_file_id  number;
    v_last_used_extent_block_id number;
    v_last_used_block           number;
begin
    dbms_space.unused_space( 
        segment_owner             =>  'SCOTT',
        segment_name              =>  'INVOICES_PARTITIONED',
        segment_type              =>  'TABLE PARTITION',
        total_blocks              =>  v_total_blocks,
        total_bytes               =>  v_total_bytes,
        unused_blocks             =>  v_unused_blocks,
        unused_bytes              =>  v_unused_bytes,
        last_used_extent_file_id  =>  v_last_used_extent_file_id,
        last_used_extent_block_id =>  v_last_used_extent_block_id,
        last_used_block           =>  v_last_used_block,
        partition_name            =>  'INVOICES_23Q3'
    );
    dbms_output.put_line('v_total_blocks    = ' || v_total_blocks);
    dbms_output.put_line('v_unused_blocks   = ' || v_unused_blocks);
    dbms_output.put_line('v_last_used_block = ' || v_last_used_block);
    dbms_output.put_line('HWM is (?) ' || (v_total_blocks - v_unused_blocks - 1));
end;
/

Readout, just after creating the table (the partition is still empty):

v_total_blocks    = 40960
v_unused_blocks   = 40926
v_last_used_block = 34
HWM is (?) 33

(Remark: the purposely-put question mark after "is" stands as a reminder that I consider this formula as questionable.)

At that stage, as the partition is initially empty, common sense would suggest that the HWM should be zero, rather than 33… Oh, didn't we precisely see that figure of 33 above? (This is not a coincidence.)

But let's proceed.

3/ Test procedure

3.a/ Step 1: we'll insert exactly 540,000 rows

insert into invoices_partitioned
with gen(n) as (
    select level - 1 from dual 
    connect by level <= 10000
)
select  
    rownum,
    date '2023-09-01',
    lpad('x',500,'x')
from  
    gen a, 
    gen b
where
    rownum <= 540000;

/* 540,000 rows inserted. */

commit;

Once we have done that, we gather statistics, get information about extents, run the (questionable) “HWM computation”, as shown above in section 2.

The readouts are as follows:

  • Extents:
OWNER    SEGMENT_NAME           PARTITION_NAME  SEGMENT_TYPE       DATA_OBJECT_ID RELATIVE_FNO   BLOCK_ID     BLOCKS
-------- ---------------------- --------------- ------------------ -------------- ------------ ---------- ----------
SCOTT    INVOICES_PARTITIONED   INVOICES_23Q3   TABLE PARTITION             30617           11    3252480       8192
SCOTT    INVOICES_PARTITIONED   INVOICES_23Q3   TABLE PARTITION             30617           11    3260672       8192
SCOTT    INVOICES_PARTITIONED   INVOICES_23Q3   TABLE PARTITION             30617           11    3268864       8192
SCOTT    INVOICES_PARTITIONED   INVOICES_23Q3   TABLE PARTITION             30617           11    3277056       8192
SCOTT    INVOICES_PARTITIONED   INVOICES_23Q3   TABLE PARTITION             30617           11    3285248       8192
SCOTT    INVOICES_PARTITIONED   INVOICES_23Q3   TABLE PARTITION             30617           11    3416320       8192

6 rows selected. 

This shows that the INSERTs at step 1 have filled the initial 5 extents; consequently a 6th extent has been allocated.

  • Partition statistics:
TABLE_NAME             PARTITION_NAME    NUM_ROWS     BLOCKS
---------------------- --------------- ---------- ----------
INVOICES_PARTITIONED   INVOICES_23Q3       540000      48958
  • HWM (questionable) computation:
v_total_blocks    = 49152
v_unused_blocks   = 0
v_last_used_block = 8192
HWM is (?) 49151

The ( total_blocks - unused_blocks - 1 ) formula gives a result close to the count of blocks in partition statistics, but not exactly the same.

3.b/ Step 2: run a FTS on the test partition, and find out how many blocks were physically read.

Before doing this, we'll flush the buffer cache, so that initially it doesn't contain any block from the test partition; further, flush the shared pool so as to ensure that the test query is hard-parsed.

I want to capture deltas in session statistics while this query is parsed and executed. However, I don't want to test this on a fully wiped-clean cursor cache (too many recursive queries would interfere with the results), so I will “warm-up” the cursor cache a little bit before actually running the test query.

alter system flush buffer_cache;
alter system flush shared_pool;

The “warm-up” queries are as follows:

select count(*) from dba_objects;
select count(*) from tab100m; -- a large table, with 100M rows

Additionally, I'll run the test query with STATISTICS_LEVEL set to ALL in the test session, because I want to get the actual plan statistics from DBMS_XPLAN.DISPLAY_CURSOR.

I also turn SERVEROUTPUT to OFF, in order to ensure that my SQL Developer client will not attempt to fetch from the DBMS_OUTPUT buffer, which would interfere with my attempt to capture deltas in session statistics. This also makes it a lot easier to use DBMS_XPLAN.DISPLAY_CURSOR with sql_id => null, cursor_child_id => null.

set serveroutput off
alter session set statistics_level = all;

I capture the deltas in session statistics using a separate session, thanks to a modified version of Adrian Billington's MyStats script; the original (link) does not support capturing statistics from another session, but adding it is not difficult. Otherwise it works in exactly the same way:

SQL> @mystats start s=s,t 100

-- Run the test activity (in session with SID=100)

SQL> @mystats stop t=1

Finally, here comes the test query itself:

select count(*) from INVOICES_PARTITIONED partition (INVOICES_23Q3);
  • Actual plan from DBMS_XPLAN.DISPLAY_CURSOR:
SQL_ID  2tv76adg3c9px, child number 0
-------------------------------------
select count(*) from INVOICES_PARTITIONED partition (INVOICES_23Q3)
 
Plan hash value: 3922651990
 
-----------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name                 | Starts | E-Rows | Cost (%CPU)| E-Time   | Pstart| Pstop | A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |                      |      1 |        |  2298 (100)|          |       |       |      1 |00:00:00.13 |   41603 |  41601 |
|   1 |  SORT AGGREGATE         |                      |      1 |      1 |            |          |       |       |      1 |00:00:00.13 |   41603 |  41601 |
|   2 |   PARTITION RANGE SINGLE|                      |      1 |    540K|  2298   (2)| 00:00:01 |     4 |     4 |    540K|00:00:00.13 |   41603 |  41601 |
|   3 |    TABLE ACCESS FULL    | INVOICES_PARTITIONED |      1 |    540K|  2298   (2)| 00:00:01 |     4 |     4 |    540K|00:00:00.13 |   41603 |  41601 |
-----------------------------------------------------------------------------------------------------------------------------------------------------------
  • Selected deltas in session statistics:
Type    Statistic Name                                                               Value
------  ----------------------------------------------------------------  ----------------
STAT    CCursor + sql area evicted                                                       4
STAT    CPU used by this session                                                         6
STAT    DB time                                                                         16
[...]
STAT    Number of read IOs issued                                                      326
[...]
STAT    Requests to/from client                                                          1
STAT    SQL*Net roundtrips to/from client                                                1
[...]
STAT    consistent gets                                                             42,150
STAT    consistent gets direct                                                      41,566
[...]
STAT    consistent gets from cache                                                     584
[...]
STAT    execute count                                                                  158
[...]
STAT    parse count (hard)                                                               8
STAT    parse count (total)                                                              9
[...]
STAT    physical read IO requests                                                      366
STAT    physical read bytes                                                    341,090,304
STAT    physical read total IO requests                                                366
[...]
STAT    physical read total multi block requests                                       327
STAT    physical reads                                                              41,637
[...]
STAT    physical reads direct                                                       41,566
[...]
STAT    session logical reads                                                       42,150
STAT    session pga memory                                                       2,621,440
[...]
STAT    table scan blocks gotten                                                    41,566
STAT    table scan disk non-IMC rows gotten                                        540,000
STAT    table scan rows gotten                                                     540,000
STAT    table scans (direct read)                                                        1
STAT    table scans (long tables)                                                        1
[...]
STAT    user calls                                                                       2
[...]

This is plenty of session statistics, though I have removed those which appear to be of lesser relevance. Meanwhile, they tend to match the plan statistics very closely.

And what should become very clear here is as follows: the count of blocks which were physically read while doing the FTS on the test partition was (according to session statistics) not higher than 41,637. According the plan statistics, it was 41,601, which is reasonably close.

Yet according to the (questionable) HWM computation formula, we should have read 49,151 blocks. That's an enormous difference of 7,500+ blocks. This alone should be enough to prove that this formula is inappropriate in the tested situation, that is in the case of locally managed tablespaces using ASSM.

5/ But then, where is the HWM?

The answer to that question requires dumping blocks, not only the segment header block itself, but also the bitmap blocks which store metadata information about usage of each block in the extents.

alter system checkpoint;
alter system dump datafile '/u02/oradata/CDB2/pdb_rva/users01.dbf' block min 3252480 block max 3252520;
alter system dump datafile '/u02/oradata/CDB2/pdb_rva/users01.dbf' block min 3416320 block max 3416360;

Remark: here I choose to dump the first 41 blocks of both the first and the last extents. (EDIT: the choice of 41 was made arbitrarily, but purposely higher than 34.)

The following grep command against the trace file enables to see what we've got:

$ grep --color=ALWAYS -E -e '^(Start dump|buffer tsn:|frmt:)' -e 'HWM Set' CDB2_ora_8211.trc

Readout:

Start dump data block from file /u02/oradata/CDB2/pdb_rva/users01.dbf minblk 3252480 maxblk 3252520
buffer tsn: 4 rdba: 0x02f1a100 (11/3252480)
frmt: 0x02 chkval: 0x5074 type: 0x20=FIRST LEVEL BITMAP BLOCK
buffer tsn: 4 rdba: 0x02f1a101 (11/3252481)
frmt: 0x02 chkval: 0x53aa type: 0x20=FIRST LEVEL BITMAP BLOCK
buffer tsn: 4 rdba: 0x02f1a102 (11/3252482)
frmt: 0x02 chkval: 0x52d7 type: 0x20=FIRST LEVEL BITMAP BLOCK
[...]
[... skipping 511 - 482 - 1 = 28 first level bitmap blocks; there are 32 of them in each 64-Mb extent.]
[...]
buffer tsn: 4 rdba: 0x02f1a11f (11/3252511)
frmt: 0x02 chkval: 0x362d type: 0x20=FIRST LEVEL BITMAP BLOCK
buffer tsn: 4 rdba: 0x02f1a120 (11/3252512)
frmt: 0x02 chkval: 0xf1d7 type: 0x21=SECOND LEVEL BITMAP BLOCK
buffer tsn: 4 rdba: 0x02f1a121 (11/3252513)
frmt: 0x02 chkval: 0xa1d2 type: 0x23=PAGETABLE SEGMENT HEADER
buffer tsn: 4 rdba: 0x02f1a122 (11/3252514)
frmt: 0x02 chkval: 0xf787 type: 0x06=trans data
buffer tsn: 4 rdba: 0x02f1a123 (11/3252515)
frmt: 0x02 chkval: 0xeaca type: 0x06=trans data
buffer tsn: 4 rdba: 0x02f1a124 (11/3252516)
[...]

Start dump data block from file /u02/oradata/CDB2/pdb_rva/users01.dbf minblk 3416320 maxblk 3416360
buffer tsn: 4 rdba: 0x02f42100 (11/3416320)
frmt: 0x02 chkval: 0x5cda type: 0x20=FIRST LEVEL BITMAP BLOCK
buffer tsn: 4 rdba: 0x02f42101 (11/3416321)
frmt: 0x02 chkval: 0x5e3a type: 0x20=FIRST LEVEL BITMAP BLOCK
buffer tsn: 4 rdba: 0x02f42102 (11/3416322)
frmt: 0x02 chkval: 0x5f3a type: 0x20=FIRST LEVEL BITMAP BLOCK
[...]
[... skipping 351 - 322 - 1 = 28 first level bitmap blocks; there are 32 of them in each 64-Mb extent.]
[...]
buffer tsn: 4 rdba: 0x02f4211f (11/3416351)
frmt: 0x02 chkval: 0xc0e4 type: 0x20=FIRST LEVEL BITMAP BLOCK
  HWM Flag: HWM Set
buffer tsn: 4 rdba: 0x02f42120 (11/3416352)
frmt: 0x02 chkval: 0x0d88 type: 0x06=trans data
buffer tsn: 4 rdba: 0x02f42121 (11/3416353)
frmt: 0x02 chkval: 0x0f81 type: 0x06=trans data
[...]

There comes the HWM!

In the last first-level bitmap block of the last extent: the "HWM Flag: HWM Set" mention tells that the location of the high-water mark is to be found in this first-level bitmap block.

This doesn't yet explain the huge gap between the formula and the count of blocks which were physically read: we have to take a closer look at the extent bitmap blocks to explain that. I'll show the details in a subsequent comment, but in a nutshell the explanation is as follows: with this extent size (of 64 Mb), each first-level bitmap block contains information about exactly 256 blocks (32 x 256 = 8192 blocks, the size of a 64-Mb extent). And it turns out that here, only 3 in all the 32 first-level bitmap blocks of the last extent reference used data blocks; add to this the 1st first-level bitmap block itself in the last extent, which references the other bitmap blocks in that extent. With these, the whole FTS of the test partition should never need to scan more than 5 x 8192 (in the 5 initial extents) + 128 + 3 x 256 (in the 6th extent) = 41856 blocks, which turns to be exactly 327 (physical read total multi block requests) x 128 Mb (multi-block read I/O size)…

Regards,

Comments
Post Details
Added on Mar 17 2024
5 comments
686 views