In a recent thread from the SQL & PL/SQL forum category, came into the discussion the following method of finding the position of the HWM (*) in a segment:
Run the DBMS_SPACE.UNUSED_SPACE procedure on the segment, then compute the HWM according to the following formula (using the values returned from that procedure):
HWM = total_blocks - unused_blocks - 1
(*) For readers new to the Oracle database, who would not have heard of the high-water mark yet: please see this page in the Concepts manual. Also note that, in segments in an ASSM tablespace, an additional subtlety is that, actually there exist both a HWM, and a (related) low-HWM.
In this post I will attempt to demonstrate that, in what is arguably the most common configuration—namely: 8-kb blocks, ASSM tablespace, locally-managed, with AUTOALLOCATE extent management—the above formula is wrong.
(Remark: with apologies for writing yet another post about this already hugely discussed topic.)
0/ Test platform
Oracle 19.9 Enterprise Edition (Oracle Linux 7.9)
The test tablespace is a small-file, locally-managed tablespace; segment management is ASSM; extent management is AUTOALLOCATE; block size is 8 kbytes.
1/ Test setup
alter session set current_schema = "SCOTT";
drop table invoices_partitioned purge;
create table invoices_partitioned (
invoice_no number not null,
invoice_date date not null,
comments varchar2(500)
)
segment creation immediate
storage (initial 64m next 64m minextents 5)
partition by range (invoice_date) (
partition invoices_past values less than (date '2023-01-01'),
partition invoices_23q1 values less than (date '2023-04-01'),
partition invoices_23q2 values less than (date '2023-07-01'),
partition invoices_23q3 values less than (date '2023-10-01'),
partition invoices_23q4 values less than (date '2024-01-01'),
partition invoices_24q1 values less than (date '2024-04-01'),
partition invoices_24q2 values less than (date '2024-07-01')
);
Notes:
- I have purposely used the
segment creation immediate clause, so that partition segments are created upfront.
- And the
storage clause was purposely crafted so that initially we'll get 5 extents, each of 64 Mb (= 8192 blocks) in every partition.
Further on, we'll use only the “INVOICES_23Q3” partition—I realize that I could have made this test simpler, using a plain, non-partitioned table; but I built from the example that came in the original discussion, so I'll keep using that anyway.
2/ Queries
2.a/ Extents in the test partition
column owner format a8
column segment_name format a22
column partition_name format a15
select
*
from
sys.dbms_parallel_execute_extents
where
owner = 'SCOTT'
and segment_name = 'INVOICES_PARTITIONED'
and partition_name = 'INVOICES_23Q3';
Readout, just after creating the table (the partition is empty at that stage):
OWNER SEGMENT_NAME PARTITION_NAME SEGMENT_TYPE DATA_OBJECT_ID RELATIVE_FNO BLOCK_ID BLOCKS
-------- ---------------------- --------------- ------------------ -------------- ------------ ---------- ----------
SCOTT INVOICES_PARTITIONED INVOICES_23Q3 TABLE PARTITION 30617 11 3252480 8192
SCOTT INVOICES_PARTITIONED INVOICES_23Q3 TABLE PARTITION 30617 11 3260672 8192
SCOTT INVOICES_PARTITIONED INVOICES_23Q3 TABLE PARTITION 30617 11 3268864 8192
SCOTT INVOICES_PARTITIONED INVOICES_23Q3 TABLE PARTITION 30617 11 3277056 8192
SCOTT INVOICES_PARTITIONED INVOICES_23Q3 TABLE PARTITION 30617 11 3285248 8192
2.b/ Statistics of the test partition
We gather statistics on the test partition as follows:
begin
dbms_stats.gather_table_stats(
ownname => 'SCOTT',
tabname => 'INVOICES_PARTITIONED',
partname => 'INVOICES_23Q3'
);
end;
/
And we query statistics as follows:
column table_name format a22
column partition_name format a15
select
table_name,
partition_name,
num_rows,
blocks
from
dba_tab_statistics
where
owner = 'SCOTT'
and table_name = 'INVOICES_PARTITIONED'
and partition_name = 'INVOICES_23Q3';
Readout (just after creating the table; the test partition is empty):
TABLE_NAME PARTITION_NAME NUM_ROWS BLOCKS
---------------------- --------------- ---------- ----------
INVOICES_PARTITIONED INVOICES_23Q3 0 0
2.c/ Segment header information
(+ path of the data file—we'll use that if/when we dump blocks from the segment.)
column file_name format a40
select
seg.segment_subtype,
seg.header_file,
seg.header_block,
dbf.file_name
from
dba_segments seg,
dba_data_files dbf
where
seg.owner = 'SCOTT'
and seg.segment_name = 'INVOICES_PARTITIONED'
and seg.partition_name = 'INVOICES_23Q3'
and dbf.file_id = seg.header_file;
Readout:
SEGMENT_SU HEADER_FILE HEADER_BLOCK FILE_NAME
---------- ----------- ------------ ----------------------------------------
ASSM 11 3252513 /u02/oradata/CDB2/pdb_rva/users01.dbf
Remark: according to DBA_SEGMENTS, the segment header block is located on file 11, block #3,252,513. Yet according to DBA_EXTENTS (above), the extent containing the segment header block starts on file 11, block #3,252,480—that's exactly 33 blocks lower.
2.d/ HWM (questionable) computation
The following PL/SQL block runs the DBMS_SPACE.UNUSED_SPACE procedure, then computes the (possible) position of the HWM according to the ( total_blocks - unused_blocks - 1 ) formula:
set serveroutput on
declare
v_total_blocks number;
v_total_bytes number;
v_unused_blocks number;
v_unused_bytes number;
v_last_used_extent_file_id number;
v_last_used_extent_block_id number;
v_last_used_block number;
begin
dbms_space.unused_space(
segment_owner => 'SCOTT',
segment_name => 'INVOICES_PARTITIONED',
segment_type => 'TABLE PARTITION',
total_blocks => v_total_blocks,
total_bytes => v_total_bytes,
unused_blocks => v_unused_blocks,
unused_bytes => v_unused_bytes,
last_used_extent_file_id => v_last_used_extent_file_id,
last_used_extent_block_id => v_last_used_extent_block_id,
last_used_block => v_last_used_block,
partition_name => 'INVOICES_23Q3'
);
dbms_output.put_line('v_total_blocks = ' || v_total_blocks);
dbms_output.put_line('v_unused_blocks = ' || v_unused_blocks);
dbms_output.put_line('v_last_used_block = ' || v_last_used_block);
dbms_output.put_line('HWM is (?) ' || (v_total_blocks - v_unused_blocks - 1));
end;
/
Readout, just after creating the table (the partition is still empty):
v_total_blocks = 40960
v_unused_blocks = 40926
v_last_used_block = 34
HWM is (?) 33
(Remark: the purposely-put question mark after "is" stands as a reminder that I consider this formula as questionable.)
At that stage, as the partition is initially empty, common sense would suggest that the HWM should be zero, rather than 33… Oh, didn't we precisely see that figure of 33 above? (This is not a coincidence.)
But let's proceed.
3/ Test procedure
3.a/ Step 1: we'll insert exactly 540,000 rows
insert into invoices_partitioned
with gen(n) as (
select level - 1 from dual
connect by level <= 10000
)
select
rownum,
date '2023-09-01',
lpad('x',500,'x')
from
gen a,
gen b
where
rownum <= 540000;
/* 540,000 rows inserted. */
commit;
Once we have done that, we gather statistics, get information about extents, run the (questionable) “HWM computation”, as shown above in section 2.
The readouts are as follows:
OWNER SEGMENT_NAME PARTITION_NAME SEGMENT_TYPE DATA_OBJECT_ID RELATIVE_FNO BLOCK_ID BLOCKS
-------- ---------------------- --------------- ------------------ -------------- ------------ ---------- ----------
SCOTT INVOICES_PARTITIONED INVOICES_23Q3 TABLE PARTITION 30617 11 3252480 8192
SCOTT INVOICES_PARTITIONED INVOICES_23Q3 TABLE PARTITION 30617 11 3260672 8192
SCOTT INVOICES_PARTITIONED INVOICES_23Q3 TABLE PARTITION 30617 11 3268864 8192
SCOTT INVOICES_PARTITIONED INVOICES_23Q3 TABLE PARTITION 30617 11 3277056 8192
SCOTT INVOICES_PARTITIONED INVOICES_23Q3 TABLE PARTITION 30617 11 3285248 8192
SCOTT INVOICES_PARTITIONED INVOICES_23Q3 TABLE PARTITION 30617 11 3416320 8192
6 rows selected.
This shows that the INSERTs at step 1 have filled the initial 5 extents; consequently a 6th extent has been allocated.
TABLE_NAME PARTITION_NAME NUM_ROWS BLOCKS
---------------------- --------------- ---------- ----------
INVOICES_PARTITIONED INVOICES_23Q3 540000 48958
- HWM (questionable) computation:
v_total_blocks = 49152
v_unused_blocks = 0
v_last_used_block = 8192
HWM is (?) 49151
The ( total_blocks - unused_blocks - 1 ) formula gives a result close to the count of blocks in partition statistics, but not exactly the same.
3.b/ Step 2: run a FTS on the test partition, and find out how many blocks were physically read.
Before doing this, we'll flush the buffer cache, so that initially it doesn't contain any block from the test partition; further, flush the shared pool so as to ensure that the test query is hard-parsed.
I want to capture deltas in session statistics while this query is parsed and executed. However, I don't want to test this on a fully wiped-clean cursor cache (too many recursive queries would interfere with the results), so I will “warm-up” the cursor cache a little bit before actually running the test query.
alter system flush buffer_cache;
alter system flush shared_pool;
The “warm-up” queries are as follows:
select count(*) from dba_objects;
select count(*) from tab100m; -- a large table, with 100M rows
Additionally, I'll run the test query with STATISTICS_LEVEL set to ALL in the test session, because I want to get the actual plan statistics from DBMS_XPLAN.DISPLAY_CURSOR.
I also turn SERVEROUTPUT to OFF, in order to ensure that my SQL Developer client will not attempt to fetch from the DBMS_OUTPUT buffer, which would interfere with my attempt to capture deltas in session statistics. This also makes it a lot easier to use DBMS_XPLAN.DISPLAY_CURSOR with sql_id => null, cursor_child_id => null.
set serveroutput off
alter session set statistics_level = all;
I capture the deltas in session statistics using a separate session, thanks to a modified version of Adrian Billington's MyStats script; the original (link) does not support capturing statistics from another session, but adding it is not difficult. Otherwise it works in exactly the same way:
SQL> @mystats start s=s,t 100
-- Run the test activity (in session with SID=100)
SQL> @mystats stop t=1
Finally, here comes the test query itself:
select count(*) from INVOICES_PARTITIONED partition (INVOICES_23Q3);
- Actual plan from DBMS_XPLAN.DISPLAY_CURSOR:
SQL_ID 2tv76adg3c9px, child number 0
-------------------------------------
select count(*) from INVOICES_PARTITIONED partition (INVOICES_23Q3)
Plan hash value: 3922651990
-----------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| E-Time | Pstart| Pstop | A-Rows | A-Time | Buffers | Reads |
-----------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2298 (100)| | | | 1 |00:00:00.13 | 41603 | 41601 |
| 1 | SORT AGGREGATE | | 1 | 1 | | | | | 1 |00:00:00.13 | 41603 | 41601 |
| 2 | PARTITION RANGE SINGLE| | 1 | 540K| 2298 (2)| 00:00:01 | 4 | 4 | 540K|00:00:00.13 | 41603 | 41601 |
| 3 | TABLE ACCESS FULL | INVOICES_PARTITIONED | 1 | 540K| 2298 (2)| 00:00:01 | 4 | 4 | 540K|00:00:00.13 | 41603 | 41601 |
-----------------------------------------------------------------------------------------------------------------------------------------------------------
- Selected deltas in session statistics:
Type Statistic Name Value
------ ---------------------------------------------------------------- ----------------
STAT CCursor + sql area evicted 4
STAT CPU used by this session 6
STAT DB time 16
[...]
STAT Number of read IOs issued 326
[...]
STAT Requests to/from client 1
STAT SQL*Net roundtrips to/from client 1
[...]
STAT consistent gets 42,150
STAT consistent gets direct 41,566
[...]
STAT consistent gets from cache 584
[...]
STAT execute count 158
[...]
STAT parse count (hard) 8
STAT parse count (total) 9
[...]
STAT physical read IO requests 366
STAT physical read bytes 341,090,304
STAT physical read total IO requests 366
[...]
STAT physical read total multi block requests 327
STAT physical reads 41,637
[...]
STAT physical reads direct 41,566
[...]
STAT session logical reads 42,150
STAT session pga memory 2,621,440
[...]
STAT table scan blocks gotten 41,566
STAT table scan disk non-IMC rows gotten 540,000
STAT table scan rows gotten 540,000
STAT table scans (direct read) 1
STAT table scans (long tables) 1
[...]
STAT user calls 2
[...]
This is plenty of session statistics, though I have removed those which appear to be of lesser relevance. Meanwhile, they tend to match the plan statistics very closely.
And what should become very clear here is as follows: the count of blocks which were physically read while doing the FTS on the test partition was (according to session statistics) not higher than 41,637. According the plan statistics, it was 41,601, which is reasonably close.
Yet according to the (questionable) HWM computation formula, we should have read 49,151 blocks. That's an enormous difference of 7,500+ blocks. This alone should be enough to prove that this formula is inappropriate in the tested situation, that is in the case of locally managed tablespaces using ASSM.
5/ But then, where is the HWM?
The answer to that question requires dumping blocks, not only the segment header block itself, but also the bitmap blocks which store metadata information about usage of each block in the extents.
alter system checkpoint;
alter system dump datafile '/u02/oradata/CDB2/pdb_rva/users01.dbf' block min 3252480 block max 3252520;
alter system dump datafile '/u02/oradata/CDB2/pdb_rva/users01.dbf' block min 3416320 block max 3416360;
Remark: here I choose to dump the first 41 blocks of both the first and the last extents. (EDIT: the choice of 41 was made arbitrarily, but purposely higher than 34.)
The following grep command against the trace file enables to see what we've got:
$ grep --color=ALWAYS -E -e '^(Start dump|buffer tsn:|frmt:)' -e 'HWM Set' CDB2_ora_8211.trc
Readout:
Start dump data block from file /u02/oradata/CDB2/pdb_rva/users01.dbf minblk 3252480 maxblk 3252520
buffer tsn: 4 rdba: 0x02f1a100 (11/3252480)
frmt: 0x02 chkval: 0x5074 type: 0x20=FIRST LEVEL BITMAP BLOCK
buffer tsn: 4 rdba: 0x02f1a101 (11/3252481)
frmt: 0x02 chkval: 0x53aa type: 0x20=FIRST LEVEL BITMAP BLOCK
buffer tsn: 4 rdba: 0x02f1a102 (11/3252482)
frmt: 0x02 chkval: 0x52d7 type: 0x20=FIRST LEVEL BITMAP BLOCK
[...]
[... skipping 511 - 482 - 1 = 28 first level bitmap blocks; there are 32 of them in each 64-Mb extent.]
[...]
buffer tsn: 4 rdba: 0x02f1a11f (11/3252511)
frmt: 0x02 chkval: 0x362d type: 0x20=FIRST LEVEL BITMAP BLOCK
buffer tsn: 4 rdba: 0x02f1a120 (11/3252512)
frmt: 0x02 chkval: 0xf1d7 type: 0x21=SECOND LEVEL BITMAP BLOCK
buffer tsn: 4 rdba: 0x02f1a121 (11/3252513)
frmt: 0x02 chkval: 0xa1d2 type: 0x23=PAGETABLE SEGMENT HEADER
buffer tsn: 4 rdba: 0x02f1a122 (11/3252514)
frmt: 0x02 chkval: 0xf787 type: 0x06=trans data
buffer tsn: 4 rdba: 0x02f1a123 (11/3252515)
frmt: 0x02 chkval: 0xeaca type: 0x06=trans data
buffer tsn: 4 rdba: 0x02f1a124 (11/3252516)
[...]
Start dump data block from file /u02/oradata/CDB2/pdb_rva/users01.dbf minblk 3416320 maxblk 3416360
buffer tsn: 4 rdba: 0x02f42100 (11/3416320)
frmt: 0x02 chkval: 0x5cda type: 0x20=FIRST LEVEL BITMAP BLOCK
buffer tsn: 4 rdba: 0x02f42101 (11/3416321)
frmt: 0x02 chkval: 0x5e3a type: 0x20=FIRST LEVEL BITMAP BLOCK
buffer tsn: 4 rdba: 0x02f42102 (11/3416322)
frmt: 0x02 chkval: 0x5f3a type: 0x20=FIRST LEVEL BITMAP BLOCK
[...]
[... skipping 351 - 322 - 1 = 28 first level bitmap blocks; there are 32 of them in each 64-Mb extent.]
[...]
buffer tsn: 4 rdba: 0x02f4211f (11/3416351)
frmt: 0x02 chkval: 0xc0e4 type: 0x20=FIRST LEVEL BITMAP BLOCK
HWM Flag: HWM Set
buffer tsn: 4 rdba: 0x02f42120 (11/3416352)
frmt: 0x02 chkval: 0x0d88 type: 0x06=trans data
buffer tsn: 4 rdba: 0x02f42121 (11/3416353)
frmt: 0x02 chkval: 0x0f81 type: 0x06=trans data
[...]
There comes the HWM!
In the last first-level bitmap block of the last extent: the "HWM Flag: HWM Set" mention tells that the location of the high-water mark is to be found in this first-level bitmap block.
This doesn't yet explain the huge gap between the formula and the count of blocks which were physically read: we have to take a closer look at the extent bitmap blocks to explain that. I'll show the details in a subsequent comment, but in a nutshell the explanation is as follows: with this extent size (of 64 Mb), each first-level bitmap block contains information about exactly 256 blocks (32 x 256 = 8192 blocks, the size of a 64-Mb extent). And it turns out that here, only 3 in all the 32 first-level bitmap blocks of the last extent reference used data blocks; add to this the 1st first-level bitmap block itself in the last extent, which references the other bitmap blocks in that extent. With these, the whole FTS of the test partition should never need to scan more than 5 x 8192 (in the 5 initial extents) + 128 + 3 x 256 (in the 6th extent) = 41856 blocks, which turns to be exactly 327 (physical read total multi block requests) x 128 Mb (multi-block read I/O size)…
Regards,