DB Version: "Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.15.0.0.0"
We are trying to implement a check that verifies that queued messages on an oracle AQ are dequeued within an acceptable time (through a connection with IBM MQ, using the oracle MGW, although that is irrelevant for this question). Because we have numerous queues to monitor, rather than checking the content of each queue table dynamically, we wrote a query that uses the statistics in V$AQ.
However, we're running into an issue with these statistics, and I'm wondering if anyone has seen this behaviour before or can explain it. At first, it all goes well, all 3 statistics columns (AVERAGE_MSG_AGE, TOTAL_WAIT, and AVERAGE_WAIT) show the number of seconds that the messages have been queued. But after dequeueing and adding a new message, both TOTAL_WAIT and AVERAGE_WAIT get a huge number that does not change anymore, while AVERAGE_MSG_AGE is quite large as well and counts seconds down (instead of up).
This does not seem to be queue or database specific - we see it happening on all queues, and in 4 different environments. And the only way to correct the statistics again (temporarily) is dropping and recreating the queue (which for obvious reasons is not something that is acceptable in live environments).
An example:
Created a queue:
begin
dbms_aqadm.create_queue_table( queue_table => 'TEST_VAQ'
, queue_payload_type => 'SYS.MGW_BASIC_MSG_T' );
dbms_aqadm.create_queue( queue_name => 'TEST_VAQ'
, queue_table => 'TEST_VAQ' );
dbms_aqadm.start_queue( queue_name => 'TEST_VAQ' );
end;
Enqueued a message:
declare
l_optns dbms_aq.enqueue_options_t;
l_messg sys.mgw_basic_msg_t;
l_props dbms_aq.message_properties_t;
l_msgid raw(16);
begin
l_messg := sys.mgw_basic_msg_t( null
, sys.mgw_text_value_t( 'test'
, null )
, null );
l_optns.visibility := dbms_aq.immediate;
dbms_aq.enqueue( queue_name => 'TEST_VAQ'
, enqueue_options => l_optns
, message_properties => l_props
, payload => l_messg
, msgid => l_msgid );
end;
Checked the statistics twice, as expected the statistics increase with the number of seconds that sysdate increases:
select name
, waiting
, ready
, expired
, average_msg_age
, total_wait
, average_wait
, sysdate
from v$aq
join user_queues
using (qid)
where name = 'TEST_VAQ';
NAME WAITING READY EXPIRED AVERAGE_MSG_AGE TOTAL_WAIT AVERAGE_WAIT SYSDATE
==== ======= ===== ======= =============== ========== ============ =======
TEST_VAQ 0 1 0 3 3 3 18.08.2023 10:37:03
...
TEST_VAQ 0 1 0 9 9 9 18.08.2023 10:37:09
Dequeued the message (this simulates the dequeue normally performed by the MGW agent):
declare
l_dqopt dbms_aq.dequeue_options_t;
l_mprop dbms_aq.message_properties_array_t;
l_messg pck_mfw_dequeuer.arr_AAMGWBM;
l_msgid dbms_aq.msgid_array_t;
l_msgat integer;
x_dequeue_time_out exception;
pragma exception_init (x_dequeue_time_out, -25228);
begin
l_dqopt.wait := dbms_aq.no_wait;
l_dqopt.visibility := dbms_aq.immediate;
loop
begin
l_msgat := dbms_aq.dequeue_array( queue_name => 'TEST_VAQ'
, dequeue_options => l_dqopt
, array_size => 500
, message_properties_array => l_mprop
, payload_array => l_messg
, msgid_array => l_msgid );
pck_ts_log.nrm('AXI_DEQUEUER', 'SCRIPT', ' l_msgat: ' || to_char(l_msgat));
exit when l_msgat = 0;
exception
when x_dequeue_time_out then
exit;
end;
end loop;
end;
Checked statistics again, is 0 as expected:
NAME WAITING READY EXPIRED AVERAGE_MSG_AGE TOTAL_WAIT AVERAGE_WAIT SYSDATE
==== ======= ===== ======= =============== ========== ============ =======
TEST_VAQ 0 0 0 0 0 0 18.08.2023 10:37:20
Enqueued a new message and checked the statistics. TOTAL_WAIT and AVERAGE_WAIT are fixed numbers now, and AVERAGE_MSG_AGE decreases with the number of seconds that sysdate increases:
NAME WAITING READY EXPIRED AVERAGE_MSG_AGE TOTAL_WAIT AVERAGE_WAIT SYSDATE
==== ======= ===== ======= =============== ========== ============ =======
TEST_VAQ 0 1 0 1145176619 18446744072564400000 18446744072564400000 18.08.2023 10:37:26
...
TEST_VAQ 0 1 0 1145176614 18446744072564400000 18446744072564400000 18.08.2023 10:37:31
No matter how many messages I still en- or dequeue after this, it does not correct itself anymore. Only after dropping the queue and table and recreating them does it reset.
Any light shed on this would be greately appreciated.