Hello,
I am using Oracle 11.2.0.2
I am running the below query, which takes a long time (almost 2 hours).
I am pretty sure that the reason for the long duration is the rather costly XML-operation (XPath query) that is being (unnecessarily!) executed on a lot of records.
However, the problem is, that the inner SELECT subquery (the one named X) does not return any records, and thus the costly XML operations should not even be applied (which is the reason for me writing "unnecessarily" just now).
INSERT /*+ APPEND */ INTO T_C_RMP_MONITORING_XML_OR_IL
(XML_DOC_ID, MESSAGE_ID, XML_CONTENT, DWH_LM_TS_UTC)
SELECT
X.XML_DOC_ID,
X.MESSAGE_ID,
CASE WHEN (X.XML_CONTENT.EXISTSNODE('/monitoring/action/param/*')=1) THEN
X.XML_CONTENT.TRANSFORM((SELECT XS.XSLT FROM T_INT_XSLT XS WHERE ID=2))
ELSE
X.XML_CONTENT
END XML_CONTENT,
TO_DATE('2014-01-17 19:57:22', 'YYYY-MM-DD HH24:MI:SS')
FROM (
SELECT
S.ID XML_DOC_ID,
M.ID MESSAGE_ID,
XMLPARSE(DOCUMENT S.XML_MESSAGE_TEXT WELLFORMED) XML_CONTENT
FROM T_S_RMP_CSC_MESSAGE_RECEIVED M,
T_S_RMP_EVALUATION_CSC_MESSAGE S
WHERE (
-- REGULAR BUSINESS LOGIC CONDITION
S.ID=M.EVALUATION_CSCMESSAGE_ID
AND M.MESSAGE_TYPE = 'MONITORING_SESSIONS'
) AND (
( -- ... EITHER CHECK LMTS...
-- CONDITION TO REDUCE DATASET TO RECORDS WITH POTENTIAL CHANGES SINCE LAST RUN
M.DWH_LM_TS_UTC>TO_DATE('2014-01-17 08:16:00', 'YYYY-MM-DD HH24:MI:SS') OR S.DWH_LM_TS_UTC>TO_DATE('2014-01-17 08:16:00', 'YYYY-MM-DD HH24:MI:SS')
) OR (
-- ... OR DON'T CHECK LMTS IF WE HAVE A FRESH INSERT
'N'='Y'
)
)
) X
WHERE X.XML_CONTENT.existsNode('/monitoring/actions')=0
LOG ERRORS INTO T_PE_PK_XML_DOC_ID_ERROR ('_I') REJECT LIMIT UNLIMITED;
The query plan is
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2249934899
------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 4 | 556 | 907 (1)| 00:00:11 |
| 1 | LOAD TABLE CONVENTIONAL | T_C_RMP_MONITORING_XML_OR_IL | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID | T_INT_XSLT | 1 | 3017 | 2 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | PK_ZZZ_MY_XSLTS | 1 | | 1 (0)| 00:00:01 |
| 4 | CONCATENATION | | | | | |
| 5 | NESTED LOOPS | | 3 | 417 | 345 (2)| 00:00:05 |
|* 6 | TABLE ACCESS BY INDEX ROWID | T_S_RMP_EVALUATION_CSC_MESSAGE | 15 | 1515 | 300 (2)| 00:00:04 |
|* 7 | INDEX RANGE SCAN | I_S_EVALUATION_CSC_MSG_LMTS | 1510 | | 7 (0)| 00:00:01 |
| 8 | BITMAP CONVERSION TO ROWIDS | | 1 | 38 | 345 (2)| 00:00:05 |
|* 9 | BITMAP INDEX RANGE SCAN | I_S_RMP_CSC_MSG_RCVD_COVER_BM1 | | | | |
| 10 | NESTED LOOPS | | | | | |
| 11 | NESTED LOOPS | | 1 | 139 | 563 (1)| 00:00:07 |
|* 12 | TABLE ACCESS BY INDEX ROWID| T_S_RMP_CSC_MESSAGE_RECEIVED | 147 | 5586 | 268 (0)| 00:00:04 |
|* 13 | INDEX RANGE SCAN | I_S_CSC_MESSAGE_RECEIVED_LMTS | 1021 | | 5 (0)| 00:00:01 |
|* 14 | INDEX UNIQUE SCAN | PK_S_RMP_EVALUATION_CSC_MESSAG | 1 | | 1 (0)| 00:00:01 |
|* 15 | TABLE ACCESS BY INDEX ROWID | T_S_RMP_EVALUATION_CSC_MESSAGE | 1 | 101 | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("ID"=2)
6 - filter(EXISTSNODE(XMLPARSE( DOCUMENT "S"."XML_MESSAGE_TEXT" WELLFORMED ),'/monitoring/actions')=0)
7 - access("S"."DWH_LM_TS_UTC">TO_DATE(' 2014-01-17 08:16:00', 'syyyy-mm-dd hh24:mi:ss'))
9 - access("M"."MESSAGE_TYPE"='MONITORING_SESSIONS' AND "S"."ID"="M"."EVALUATION_CSCMESSAGE_ID")
filter("M"."EVALUATION_CSCMESSAGE_ID" IS NOT NULL AND "M"."MESSAGE_TYPE"='MONITORING_SESSIONS' AND
"S"."ID"="M"."EVALUATION_CSCMESSAGE_ID")
12 - filter("M"."EVALUATION_CSCMESSAGE_ID" IS NOT NULL AND "M"."MESSAGE_TYPE"='MONITORING_SESSIONS')
13 - access("M"."DWH_LM_TS_UTC">TO_DATE(' 2014-01-17 08:16:00', 'syyyy-mm-dd hh24:mi:ss'))
14 - access("S"."ID"="M"."EVALUATION_CSCMESSAGE_ID")
15 - filter(EXISTSNODE(XMLPARSE( DOCUMENT "S"."XML_MESSAGE_TEXT" WELLFORMED ),'/monitoring/actions')=0
AND LNNVL("S"."DWH_LM_TS_UTC">TO_DATE(' 2014-01-17 08:16:00', 'syyyy-mm-dd hh24:mi:ss')))
Note
-----
- Unoptimized XML construct detected (enable XMLOptimizationCheck for more information)
If I run just the inner query by itself, the empty resultset is returned within 0.25 seconds:
SELECT count(*)
FROM T_S_RMP_CSC_MESSAGE_RECEIVED M,
T_S_RMP_EVALUATION_CSC_MESSAGE S
WHERE (
-- REGULAR BUSINESS LOGIC CONDITION
S.ID=M.EVALUATION_CSCMESSAGE_ID
AND M.MESSAGE_TYPE = 'MONITORING_SESSIONS'
) AND (
( -- ... EITHER CHECK LMTS...
-- CONDITION TO REDUCE DATASET TO RECORDS WITH POTENTIAL CHANGES SINCE LAST RUN
M.DWH_LM_TS_UTC>TO_DATE('2014-01-17 08:16:00', 'YYYY-MM-DD HH24:MI:SS') OR S.DWH_LM_TS_UTC>TO_DATE('2014-01-17 08:16:00', 'YYYY-MM-DD HH24:MI:SS')
) OR (
-- ... OR DON'T CHECK LMTS IF WE HAVE A FRESH INSERT
'N'='Y'
)
);
Explain plan is
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1290828596
------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 46 | 5390 (1)| 00:01:05 |
| 1 | SORT AGGREGATE | | 1 | 46 | | |
| 2 | CONCATENATION | | | | | |
| 3 | NESTED LOOPS | | 332 | 15272 | 4828 (1)| 00:00:58 |
| 4 | TABLE ACCESS BY INDEX ROWID | T_S_RMP_EVALUATION_CSC_MESSAGE | 1510 | 21140 | 296 (0)| 00:00:04 |
|* 5 | INDEX RANGE SCAN | I_S_EVALUATION_CSC_MSG_LMTS | 1510 | | 7 (0)| 00:00:01 |
| 6 | BITMAP CONVERSION TO ROWIDS | | 1 | 32 | 4828 (1)| 00:00:58 |
|* 7 | BITMAP INDEX RANGE SCAN | I_S_RMP_CSC_MSG_RCVD_COVER_BM1 | | | | |
| 8 | NESTED LOOPS | | | | | |
| 9 | NESTED LOOPS | | 146 | 6716 | 562 (0)| 00:00:07 |
|* 10 | TABLE ACCESS BY INDEX ROWID| T_S_RMP_CSC_MESSAGE_RECEIVED | 147 | 4704 | 268 (0)| 00:00:04 |
|* 11 | INDEX RANGE SCAN | I_S_CSC_MESSAGE_RECEIVED_LMTS | 1021 | | 5 (0)| 00:00:01 |
|* 12 | INDEX UNIQUE SCAN | PK_S_RMP_EVALUATION_CSC_MESSAG | 1 | | 1 (0)| 00:00:01 |
|* 13 | TABLE ACCESS BY INDEX ROWID | T_S_RMP_EVALUATION_CSC_MESSAGE | 1 | 14 | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("S"."DWH_LM_TS_UTC">TO_DATE(' 2014-01-17 08:16:00', 'syyyy-mm-dd hh24:mi:ss'))
7 - access("M"."MESSAGE_TYPE"='MONITORING_SESSIONS' AND "S"."ID"="M"."EVALUATION_CSCMESSAGE_ID")
filter("M"."EVALUATION_CSCMESSAGE_ID" IS NOT NULL AND "M"."MESSAGE_TYPE"='MONITORING_SESSIONS' AND
"S"."ID"="M"."EVALUATION_CSCMESSAGE_ID")
10 - filter("M"."EVALUATION_CSCMESSAGE_ID" IS NOT NULL AND "M"."MESSAGE_TYPE"='MONITORING_SESSIONS')
11 - access("M"."DWH_LM_TS_UTC">TO_DATE(' 2014-01-17 08:16:00', 'syyyy-mm-dd hh24:mi:ss'))
12 - access("S"."ID"="M"."EVALUATION_CSCMESSAGE_ID")
13 - filter(LNNVL("S"."DWH_LM_TS_UTC">TO_DATE(' 2014-01-17 08:16:00', 'syyyy-mm-dd hh24:mi:ss')))
Note that estimated cost and time of this sub-select are way higher than for the full query above, although the query runs much faster.
I am thinking that the cost-based optimizer underestimates the XML-operation (specifically the XPath search operation existsNode) and thus is trying to run that operation first, thinking that this way, the total cost will be much less.
If, on the first query, the engine would simply evaluate the subquery part first, it would come back with the same empty resultset right away and would be done, rather than taking 2 hours.
Is there an optimizer hint that would help me? Any other ideas?
I know I could add a flag field in which I store the result of the XML_CONTENT.existsNode('/monitoring/actions') call beforehand and which I would then access in the query instead of calling the function right in that query. Just hoping for something simpler and more elegant...
Thank you in advance!