Hi All,
I came across this good article on AQ troubleshooting. Thought of sharing it with all, hence, this post. Out of things mentioned in article, I have outlined a few steps, that helped me in our environment while troubleshooting AQ issues.
How to troubleshoot AQ issues?
https://blogs.oracle.com/db/entry/oracle_support_master_note_for_troubleshooting_advanced_queuing_and_oracle_streams_propagation_issue
a step-by-step methodology for troubleshooting and resolving problems with Advanced Queuing Propagation in both Streams and basic Advanced Queuing environments
1. Check if queues are buffered (in-memory) or persistent (on disk in a queue_table).
SQL> select * from v$buffered_queues;
no rows selected
2. Check if queue_to_queue or queue_to_dblink is used.
col destination for a15
col session_id for a15
set line 200
select qname,destination,session_id,process_name,schedule_disabled,instance, current_start_time
from dba_queue_schedules order by current_start_time desc,schedule_disabled desc ;
3. Check if queues are propagating data at all or are slow?
select TOTAL_NUMBER from DBA_QUEUE_SCHEDULES where QNAME='&queue_name';
4. Check job_queue_processes parameter. Should be more than 4.
5. Identify job queue processes...
For 10.2 and lower
select p.SPID, p.PROGRAM
from V$PROCESS p, DBA_JOBS_RUNNING jr, V$SESSION s, DBA_JOBS j
where s.SID=jr.SID
and s.PADDR=p.ADDR
and jr.JOB=j.JOB
and j.WHAT like '%sys.dbms_aqadm.aq$_propaq(job)%';
For 11.1 and higher
col PROGRAM for a30
select p.SPID, p.PROGRAM, j.JOB_name
from v$PROCESS p, DBA_SCHEDULER_RUNNING_JOBS jr, V$SESSION s, DBA_SCHEDULER_JOBS j
where s.SID=jr.SESSION_ID
and s.PADDR=p.ADDR
and jr.JOB_name=j.JOB_NAME
--and j.JOB_NAME like '%AQ_JOB$_%';
6. Check Alert.log and tracefiles for more information.
7. Check if DBlink is working fine through owner of DBlink.
8. Check queue errors and also find out associated Queue table
set linesize 140;
column destination format a25;
column last_error_msg format a35;
column schema format a15
select schema,
qname,
destination,
failures,
last_error_date,
last_error_time,
last_error_msg
from dba_queue_schedules
where failures != 0;
select QUEUE_TABLE from DBA_QUEUES where NAME ='&queue_name';
Check what queue is supposed to do
column qname format a40
column user_comment format a40
column last_error_msg format a40
column destination format a25
select distinct a.schema || '.' || a.qname qname
,a.destination
,a.schedule_disabled
,b.user_comment
from dba_queue_schedules a, dba_queues b
where a.qname=b.name;
9. Check if Queues are disabled.
select schema || '.' || qname,
destination,
schedule_disabled,
last_error_msg
from dba_queue_schedules
where schedule_disabled='Y';
10. If queue is DISABLED...enable it using following
select 'exec dbms_aqadm.enable_propagation_schedule(''' || schema || '.' || qname || ''', ''' || destination || ''');'
from dba_queue_schedules
where schedule_disabled='Y';
10.1 Check if propagation has been set correctly
Check that the propagation schedule has been created and that a job queue process has been assigned. Look for the entry in DBA_QUEUE_SCHEDULES and SYS.AQ$_SCHEDULES for your schedule. For 10g and below, check that it has a JOBNO entry in SYS.AQ$_SCHEDULES, and that there is an entry in DBA_JOBS with that JOBNO. For 11g and above, check that the schedule has a JOB_NAME entry in SYS.AQ$_SCHEDULES, and that there is an entry in DBA_SCHEDULER_JOBS with that JOB_NAME. Check the destination is as intended and spelled correctly.
10.2 Check if a Process_Name has been assigned to a queue, if no process_name is assigned...schedule is not currently executing. You may need to execute this statement no. of times to verify if a process is being allocated.
10.3 if a process_name is assigned and schedule executing but failing...Refer to step 8 for errors.
10.4 Check if queue tables exists in sys
SQL> select NAME, ENQUEUE_ENABLED, DEQUEUE_ENABLED
from DBA_QUEUES where owner='SYS'
and QUEUE_TABLE like '%PROP_TABLE%';
If the %PROP_NOTIFY queue is not enabled for enqueue or dequeue, it should be so enabled using DBMS_AQADM.START_QUEUE. However, the exception queue AQ$_AQ$_PROP_TABLE_E should not be enabled for enqueue or dequeue.
10.5 Check that the remote queue the propagation is transferring messages to exists and is enabled for enqueue
If the AQ_PROP_NOTIFY queue is not enabled for enqueue or dequeue, it should be so enabled using DBMS_AQADM.START_QUEUE. However, the exception queue AQ$_AQ$_PROP_TABLE_E should not be enabled for enqueue or dequeue.
11. Check performance of each queue.
col last_run_date for a40
col qname for a25
col NEXT_RUN_DATE for a40
col seconds for 9999
set line 200
select qname,
last_run_date,
NEXT_RUN_DATE,
total_number MESSAGES,
total_bytes/1024 KBYTES,
total_time SECONDS,
round(total_bytes/(total_time+0.0000001)) BYTES_PER_SEC, process_name
from dba_queue_schedules
order by BYTES_PER_SEC;
12. Check if there are locking issues...High value for Ctime>1800 indicates suspicious lock
select * from gv$transaction_enqueue order by ctime;
12.1 Find out objects accessed by session
select * from gv$access
where sid = 176 and object like 'T_%'
and owner = 'owner_name';
INST_ID SID OWNER OBJECT TYPE
---------- ---------- ------------------------------ ------------------------------ ------
3 176 owner_name Some_queue_table_name TABLE
2 176 owner_name Some_queue_table_name TABLE
1 176 owner_name Some_queue_table_name TABLE
12.2
select * from gv$lock
where sid = 176 and inst_id = 3;
INST_ID ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------- -------- -------- ---------- -- ---------- ---------- ---------- --------- ---------- ----------
3 A404050C A4040520 176 TM 35580 0 3 0 82823 2
3 A40407A0 A40407B4 176 TM 35578 0 3 0 82823 2
3 A4040058 A404006C 176 TM 35591 0 3 0
12.3
select object_name
from gv$lock l join dba_objects on id1 = object_id
where sid = 176 and inst_id = 3
and type = 'TM';
OBJECT_NAME
------------------------------
AQ$_queue_table_name_T
AQ$_queue_table_name_H
AQ$_queue_table_name_I
Some_queue_table_name
12.4. It could be that session is stuck...mostly, it ll be job in dbms_job trying to propagate message for 10.2 and below version
select /*+rule*/ *
from dba_jobs_running
where sid = 176;
SID JOB FAILURES LAST_DATE LAST_SEC THIS_DATE THIS_SEC INSTANCE
---------- ---------- ---------- --------- ------------------------ ---------
----------
176 2867992 13-MAY-07 16:27:06 3
select job,what,this_date,next_date,broken
from dba_jobs
where job = 2867992;
JOB WHAT THIS_DATE NEXT_DATE B
---------- -------------------------------------------------- ----------------- ----------------- -
2867992 next_date := sys.dbms_aqadm.aq$_propaq(job); 13-MAY-2007 16:27 13-MAY-2007 16:27 N
Check the job has an associated propagation schedule. If it doesn’t then that means the locks being seen are problems because the job is not doing anything.
select sid,jobno
from sys.aq$_schedules
where jobno = 2867992;
no rows selected
Check the job still has a thread running within the Oracle executable:
select sid,spid,p.program
from gv$session s join gv$process p on paddr = addr
where s.sid= 176 and s.inst_id = 3;
SID SPID PROGRAM
---------- ------------ ----------------------------------------------------------------
176 4608 ORACLE.EXE (J044)
v$session_wait shows it is still waiting for input even though the link has gone, confirming the issue:
select event from gv$session_wait where sid = 176 and inst_id = 3;
EVENT
----------------------------------------------------------------
SQL*Net message from dblink
13. Tracing queues
10.2 and below
connect / as sysdba
select p.SPID, p.PROGRAM
from v$PROCESS p, DBA_JOBS_RUNNING jr, V$SESSION s, DBA_JOBS j
where s.SID=jr.SID
and s.PADDR=p.ADDR
and jr.JOB=j.JOB
and j.WHAT like '%sys.dbms_aqadm.aq$_propaq(job)%';
-- For the process id (SPID) attach to it via oradebug and generate the following trace
oradebug setospid <SPID>
oradebug unlimit
oradebug Event 10046 trace name context forever, level 12
oradebug Event 24040 trace name context forever, level 10
-- Trace the process for 5 minutes
oradebug Event 10046 trace name context off
oradebug Event 24040 trace name context off
-- The following command returns the pathname/filename to the file being written to
oradebug tracefile_name
11g
connect / as sysdba
col PROGRAM for a30
select p.SPID, p.PROGRAM, j.JOB_NAME
from v$PROCESS p, DBA_SCHEDULER_RUNNING_JOBS jr, V$SESSION s, DBA_SCHEDULER_JOBS j
where s.SID=jr.SESSION_ID
and s.PADDR=p.ADDR
and jr.JOB_NAME=j.JOB_NAME
and j.JOB_NAME like '%AQ_JOB$_%';
-- For the process id (SPID) attach to it via oradebug and generate the following trace
oradebug setospid <SPID>
oradebug unlimit
oradebug Event 10046 trace name context forever, level 12
oradebug Event 24040 trace name context forever, level 10
-- Trace the process for 5 minutes
oradebug Event 10046 trace name context off
oradebug Event 24040 trace name context off
-- The following command returns the pathname/filename to the file being written to
oradebug tracefile_name
------------------------
How to Enable/Diasble queue
col desitnation for a25
select QNAME,DESTINATION,SCHEDULE_DISABLED from dba_queue_Schedules where destination='DB_link';
exec dbms_aqadm.DISABLE_PROPAGATION_SCHEDULE(QUEUE_NAME=>'&Enter_SchemaName_QueueName',DESTINATION=>'&Enter_Destination');
exec dbms_aqadm.ENABLE_PROPAGATION_SCHEDULE(QUEUE_NAME=>'&Enter_SchemaName_QueueName',DESTINATION=>'&Enter_Destination');
exec dbms_aqadm.unschedule_propagation(QUEUE_NAME=>'&Enter_SchemaName_QueueName',DESTINATION=>'&Enter_Destination');
exec dbms_aqadm.schedule_propagation(QUEUE_NAME=>'&Enter_SchemaName_QueueName',DESTINATION=>'&Enter_Destination');