SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Odd Explain Plan

John SpencerNov 17 2015 — edited Nov 18 2015

Oracle 11.2.0.4 running on an Exadata box.

This is not really a tuning question, I have already re-written the posted query to return in a reasonable amount of time. I realize that the query as written is not particularly efficient, but I would expect it to return results, which it doesn't, at least not in finite time.

There are three tables involved:

Table1 holds information about things. The data comes from multiple sources and each source has its own ID for that thing (sourceid), in many cases these things represent the same entity. There is a, theoretically, unique common identifier that can identify the entity across all of the sources (commonid), but this information is not always available. When it is, we use that to link together the "same" entity across the various sources and assign the same id. When it is not present, we use a combination of other attributes to link the entities together.

There was a bug in the linking code which resulted in many entities being incorrectly linked together. Prior to unlinking the inappropriately linked entities we created the two other tables which have the ID, source and sourceid of the affected entities at the time of unlinking.

The query is intended to find all of the source/sourceid combinations where there are multiple commonid values in the same linked set, that we have not already identified (hence the not in table1/table2 construct).

The query created by the developer was:

select * from table1

where id in (select r1.id

from table1 r1, table1 r2

where r1.id = r2.id and

r1.commonid != r2.commonid and

r1.commonid is not null and

r2.commonid is not null and

r1.id not in (select r.id

from table1 r, table2 u

where r.source = u.source and

r.sourceid = u.sourceid) and

r1.id not in (select r.id

from table1 r, table3 u

where r.source = u.source and

r.sourceid = u.sourceid))

which resulted in the following plan:

----------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |             |    10M|  2058M|   949M  (1)|999:59:59 |
|*  1 |  FILTER                                    |             |       |       |            |          |
|   2 |   TABLE ACCESS STORAGE FULL                | TABLE1      |    10M|  2058M| 71307   (1)| 00:14:16 |
|   3 |   NESTED LOOPS ANTI                        |             |     1 |    46 |    15   (0)| 00:00:01 |
|   4 |    NESTED LOOPS                            |             |     1 |    44 |     6   (0)| 00:00:01 |
|*  5 |     TABLE ACCESS BY INDEX ROWID            | TABLE1      |     1 |    22 |     3   (0)| 00:00:01 |
|*  6 |      INDEX RANGE SCAN                      | ID_INDEX    |     1 |       |     2   (0)| 00:00:01 |
|*  7 |       HASH JOIN                            |             |     1 |    37 |    77   (2)| 00:00:01 |
|   8 |        TABLE ACCESS BY INDEX ROWID         | TABLE1      |     2 |    44 |     5   (0)| 00:00:01 |
|*  9 |         INDEX RANGE SCAN                   | ID_INDEX    |     2 |       |     3   (0)| 00:00:01 |
|  10 |        TABLE ACCESS STORAGE FULL FIRST ROWS| TABLE2      | 57379 |   840K|    71   (0)| 00:00:01 |
|* 11 |     TABLE ACCESS BY INDEX ROWID            | TABLE1      |     1 |    22 |     3   (0)| 00:00:01 |
|* 12 |      INDEX RANGE SCAN                      | ID_INDEX    |     1 |       |     2   (0)| 00:00:01 |
|  13 |    VIEW PUSHED PREDICATE                   | VW_NSO_1    |     1 |     2 |     9   (0)| 00:00:01 |
|* 14 |     HASH JOIN                              |             |     1 |    36 |     9   (0)| 00:00:01 |
|  15 |      TABLE ACCESS BY INDEX ROWID           | TABLE1      |     2 |    44 |     5   (0)| 00:00:01 |
|* 16 |       INDEX RANGE SCAN                     | ID_INDEX    |     2 |       |     3   (0)| 00:00:01 |
|  17 |      TABLE ACCESS STORAGE FULL FIRST ROWS  | TABLE3      |   818 | 11452 |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( EXISTS (SELECT /*+ <not feasible>)
   5 - filter("R1"."COMMONID" IS NOT NULL)
   6 - access("R1"."ID"=:B1)
       filter( NOT EXISTS (SELECT 0 FROM "TABLE2" "U","TABLE1" "R" WHERE
              "R"."ID"=:B1 AND "R"."SOURCEID"="U"."SOTRCEID" AND "R"."SOURCE"="U"."SOURCE"))
   7 - access("R"."SOURCE"="U"."SOURCE" AND "R"."SOURCEID"="U"."SOURCEID")
   9 - access("R"."ID"=:B1)
  11 - filter("R2"."COMMONID" IS NOT NULL AND "R1"."COMMONID"<>"R2"."COMMONID")
  12 - access("R2"."ID"=:B1)
       filter("R1"."ID"="R2"."ID")
  14 - access("R"."SOURCE"="U"."SOURCE" AND "R"."SOURCEID"="U"."SOURCEID")
  16 - access("R"."ID"="R1"."ID")

The row count estimates for all of the table scans are bang on, I am a little dubious about the expected rows from the index accesses on table1, but they are not hugely off, I would expect something more like in the tens of rows on average.

What does puzzle me is the massive increase in time and cost between line 2 and line 0. I am also puzzled by the presence of a filter operation at line 1, and particularly by the predicate shown against it "filter( EXISTS (SELECT /*+ <not feasible>)". Have we made the optimizer give up?

Any thoughts would be appreciated, and if more information would help, I can try to get it.

Thanks

John

This post has been answered by Jonathan Lewis on Nov 17 2015

Jump to Answer

Locked Post

New comments cannot be posted to this locked post.

Locked on Dec 16 2015

Added on Nov 17 2015

4 comments

811 views