Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

Simple SQL query with NOT IN condition is too slow

I have a table MY_TABLE with columns STATUS and MESSAGE_ID. STATUS column has an index on it. The actual data distribution now is the following:

select status, count(*) 
from my_table 
group by status 
order by 2 desc;

STATUS           COUNT(*)
-------------------------
STATUS_7         10580209
STATUS_6            23902
STATUS_5             1504
STATUS_4               48
STATUS_3               17
STATUS_2                8
STATUS_1                5

As you can see, 99.99% of table rows have one of 2 most frequent statuses.

I need to select rows with other 5 statuses:

select message_id
  from my_table
 where status in ('STATUS_1', 'STATUS_2', 'STATUS_3', 'STATUS_4', 'STATUS_5');

This query is executed in 0.05 seconds and has a good plan:

PLAN_TABLE_OUTPUT                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
------------------------------------------------------------------------------------------------------
Plan hash value: 1564383627
 
------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |               |    20 |   320 |     8   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR                     |               |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE      |    20 |   320 |     8   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | STATUS_INDEX  |    20 |       |     7   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
 
   1 - SEL$1
   2 - SEL$1 / MY_TABLE@SEL$1
   3 - SEL$1 / MY_TABLE@SEL$1
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("STATUS"='STATUS_1' OR "STATUS"='STATUS_2' OR 
              "STATUS"='STATUS_3' OR "STATUS"='STATUS_4' OR 
              "STATUS"='STATUS_5')
 
Column Projection Information (identified by operation id):
-----------------------------------------------------------
 
   1 - "MESSAGE_ID"[NUMBER,22]
   2 - "MESSAGE_ID"[NUMBER,22]
   3 - "MY_TABLE".ROWID[ROWID,10]

The problem is that the list of statuses may change, and the real business requirement is “to select all rows except status 6 and 7”:

select message_id
  from my_table
 where status not in ('STATUS_6', 'STATUS_7');

This query is executed 6 seconds (120 time longer) and oracle makes here full table scan:

PLAN_TABLE_OUTPUT                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
------------------------------------------------------------------------------
Plan hash value: 150318439
 
------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          | 23834 |   372K| 27112   (2)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| MY_TABLE | 23834 |   372K| 27112   (2)| 00:00:02 |
------------------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1 / MY_TABLE@SEL$1
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("STATUS"<>'STATUS_6' AND "STATUS"<>'STATUS_7')
 
Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - "MESSAGE_ID"[NUMBER,22]

I have tied to write a query which produces list of desired statuses:

select distinct status 
from my_table 
where status not in ('STATUS_6', 'STATUS_7');

This query also works fast - again ~0.05 seconds.

But when I use this as a subquery for the first one -

select t.message_id
  from my_table t
 where t.status in (select distinct x.status
                      from my_table x
                     where x.status not in ('STATUS_6', 'STATUS_7'))

It still works slow (3 seconds). The plan:

PLAN_TABLE_OUTPUT                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
-----------------------------------------------------------------------------------
Plan hash value: 2846122266
 
-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              | 23834 |   605K| 75182   (2)| 00:00:03 |
|   1 |  NESTED LOOPS SEMI |              | 23834 |   605K| 75182   (2)| 00:00:03 |
|*  2 |   TABLE ACCESS FULL| MY_TABLE     | 23834 |   372K| 27112   (2)| 00:00:02 |
|*  3 |   INDEX RANGE SCAN | STATUS_INDEX | 23834 |   232K|     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
 
   1 - SEL$5DA710D3
   2 - SEL$5DA710D3 / T@SEL$1
   3 - SEL$5DA710D3 / X@SEL$2
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter("T"."STATUS"<>'STATUS_6' AND "T"."STATUS"<>'STATUS_7')
   3 - access("T"."STATUS"="X"."STATUS")
       filter("X"."STATUS"<>'STATUS_6' AND "X"."STATUS"<>'STATUS_7')
 
Column Projection Information (identified by operation id):
-----------------------------------------------------------
 
   1 - (#keys=0) "T"."MESSAGE_ID"[NUMBER,22]
   2 - "T"."MESSAGE_ID"[NUMBER,22], "T"."STATUS"[VARCHAR2,50]

So, what I don't understand here, is why Oracle can't find a faster solution? When I do it in two steps (get list of statuses, get rows with statuses), it takes 0.05 second for each step, but when I put these two steps in one query, Oracle can't find this way.

I have tried to gather statistics, write the query in different ways, use hints (materialized, index, and some others), but nothing makes it better, only worse.

To reproduce the situation, you can use the following:

create table test_table(id number primary key, value varchar2(10));

insert into test_table(id, value)
select rownum, 'VAL' || floor(log(10, dbms_random.value(1, 1000000)))
  from dual
connect by level < 1000000;

create index test_table_idx on test_table(value);

begin
  dbms_stats.gather_table_stats (
     ownname    => 'YOUR_USERNAME',
     tabname    => 'TEST_TABLE',
     method_opt => 'FOR ALL');
end;
/
Comments
Post Details
Added on Mar 15 2023
14 comments
226 views