SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

Real life task such as the more joins in solution the faster it runs

Alex.UAMar 19 2025

Just wanted to share one interesting case.

The task is for each ID and name get matching names from other IDs and show comma separated list of matching values. If there are no matching values then return NULL.

For below sample data

drop table t;
create table t(id,name,value) as
(
select 1, 'A1', cast('V11' as varchar2(100)) from dual
union all select 1, 'A1', 'V12' from dual 
union all select 1, 'A2', 'V21' from dual
union all select 2, 'A1', 'V11' from dual
union all select 2, 'A2', 'V21' from dual
union all select 2, 'A3', 'V' from dual
union all select 3, 'A1', 'V11' from dual
union all select 3, 'A1', 'V12' from dual
union all select 3, 'A2', 'V21' from dual
union all select 3, 'A3', 'V31' from dual
union all select 4, 'A3', 'QQ' from dual
union all select 4, 'A3', 'Q' from dual
union all select 4, 'A2', 'V21' from dual
union all select 5, 'A3', 'Q' from dual
);

The result is following

SQL> select t1.id id1,
 2         t2.id id2,
 3         t1.name,
 4         listagg(decode(t1.value, t2.value, t1.value), ', ') within group (order by t1.value) matched_str
 5    from t t1 join t t2 on t1.id <> t2.id and t1.name = t2.name
 6  group by t1.id,
 7           t2.id,
 8           t1.name
 9  order by 1, 2, 3;
      ID1        ID2 NA MATCHED_STR
---------- ---------- -- ---------------
        1          2 A1 V11
        1          2 A2 V21
        1          3 A1 V11, V12
        1          3 A2 V21
        1          4 A2 V21
        2          1 A1 V11
        2          1 A2 V21
        2          3 A1 V11
        2          3 A2 V21
        2          3 A3
        2          4 A2 V21
        2          4 A3
        2          5 A3
        3          1 A1 V11, V12
        3          1 A2 V21
        3          2 A1 V11
        3          2 A2 V21
        3          2 A3
        3          4 A2 V21
        3          4 A3
        3          5 A3
        4          1 A2 V21
        4          2 A2 V21
        4          2 A3
        4          3 A2 V21
        4          3 A3
        4          5 A3 Q
        5          2 A3
        5          3 A3
        5          4 A3 Q
30 rows selected.

Even though solution returns correct result and has only one join, it is extremely inefficient because it is missing equality predicate by value hence cardinality blows up. Values are eventually compared in the decode function.

The most efficient solution I found so far is

select ttt.*, matched_str
from (select distinct t1.id id1, t2.id id2, t1.name
     from t t1 join t t2 on t1.id <> t2.id and t1.name = t2.name) ttt
    left join (select t1.id id1, t2.id id2, t1.name, listagg(t1.value, ', ') within group (order by t1.value) matched_str
               from t t1 join t t2 on t1.name = t2.name and t1.value = t2.value
               where t1.id <> t2.id
               group by t1.id, t2.id, t1.name) tg
        on ttt.id1 = tg.id1 and ttt.id2 = tg.id2 and ttt.name = tg.name
order by 1, 2, 3

In this case we have two inline views - the one to get all necessary combinations of matching names and the other one where we also take values into account. Once we have these two datasets the final outer join gives what we need. All is good but solution has 3 join which bring the question - can we reduce the number of joins without sacrifying the performance?

Indeed, we can solve this with 2 joins

select t1.id id1,
      t2.id id2,
      t1.name,
      listagg(t3.value, ', ') within group (order by t3.value) matched_str
from t t1
join (select distinct id, name from t) t2 on t1.id <> t2.id and t1.name = t2.name
left join t t3 on t2.id = t3.id and t2.name = t3.name and t1.value = t3.value
group by t1.id, t2.id, t1.name
order by t1.id, t2.id, t1.name;

On a generated data this solution is orders of magnitude faster than the original one and at the same time it is orders of magnitude slower than the solution with 3 joins (plans with actual cardinalities explain why it is so).

I'm positing this because I'm curious if there is anything I might have missed and perhaps there are other solutions with the same or less number of joins and have better performance.

Also I think this is a good example which demonstrates that less joins does not always mean better perfoemance.

Thank you

This post has been answered by Jonathan Lewis on Mar 20 2025

Jump to Answer

Alex.UA

Let me attach the script which was used to measure the performance.
The first solution has additional predicate “1 = 0” because otherwise it runs forever on generated data. :-)

test.txt

Paulzip

May've misunderstood your requirements, but perhaps…

with
  id_name as (
    select distinct id, name from t  
  )
select 
    in1.id as id1,
    in2.id as id2,
    in1.name,
    listagg(distinct t1.value, ', ') within group (order by t1.value) as matching_values
from id_name in1
join id_name in2 on in1.name = in2.name and in1.id <> in2.id
left join t t1 on t1.id = in1.id and t1.name = in1.name
left join t t2 on t2.id = in2.id and t2.name = in2.name and t1.value = t2.value
group by in1.id, in2.id, in1.name
order by in1.id, in2.id, in1.name;

Alex.UA

Paul, you can run your query and see that result differs from what what shown.

If we comment one unnessesary join and make some minor amendments…

with
 id_name as (
   select distinct id, name from t  
 )
select 
   in1.id as id1,
   t1.id as id2,
   in1.name,
   listagg(t2.value, ', ') within group (order by t2.value) as matching_values
from id_name in1
--join id_name in2 on in1.name = in2.name and in1.id <> in2.id
/*left*/ join t t1 on t1.id <> in1.id and t1.name = in1.name
left join t t2 on t2.id = in1.id and t2.name = in1.name and t1.value = t2.value
group by in1.id, t1.id, in1.name
order by in1.id, t1.id, in1.name;

… then your query starts to return corect result and effectively becomes query #3 from the original post.

As I mentioned previously, this query runs orders of magnitude slower on generated data than query #2.

Alex.UA

Just to complete the picture.

model solution

with tt as
(
select t.id id1, decode(t.id, d.id, -1, d.id) id2, name, t.value
from t join (select distinct id, name from t) d using (name) 
)
, m as
(
select *
from tt
model
dimension by (id1, id2, name, value)
measures (0 flag)
rules
(
  flag[any, id2 > -1, any, any] = presentv(flag[cv(id2), -1, cv(name), cv(value)], 1, 0)
)
)
select id1, id2, name, listagg(decode(flag, 1, value), ', ') within group (order by value) matched_str
from m
where id2 > -1
group by id1, id2, name
order by id1, id2, name;

The performance is somewhat similar to the solution with 2 joins but slower.

Jonathan Lewis

Answer

Which version of Oracle are you using?

Your requirement is not particularly rare, but I assume you have a relatively large number of names per id with a significant overlap of names across ids so that the simplest query produces an intermediate volume of results of size “n-squared”. It's a pattern I call the “brontosaurus” query (here's a link to a note I write a few years ago modelling another example with solution).

I don't think many people notice the pattern, though, because in most cases the “many-to-many” join in the middle is just “big” rather than “enormous”. The solution to many such problems is to visit critical tables twice each, selecting and joining subsets before going back to each table a second time for efficient access to the rest of the data. There are a few more articles on my blog about manual optimisation of this type.

Regards

Jonathan Lewis

P.S. I asked about the version of Oracle becasuse my sandbox 19c produced execution plans where the threatening part of the SQL appeared in all three versions of the code and I wondered if my version had transformed out your optimisation, or whether the real data set had a pattern which meant the “threat” portion got very much worse in the slow query because of the exrta data it was carrying that wasn't carried by the equivalent portion of the better plans. (One of the things to be very careful of with rewrites that duplicate tables is that sometimes an upgrade allows the optimizer to spot that it can eliminate a table that you've carefully add to the query.)

Marked as Answer by Alex.UA · Mar 20 2025

Alex.UA

Hey Jonathan! Thanks for chipping in!

Indeed, your case is quite similar from logical perspective - i.e. extra scan + group by (distinct) helps to drastically improve the performance!

That’s great point that optimizer might be smart enough to eliminate joins (although in my experience it was capable of doing that when joining on foreign keys). In whatever case I believe we can avoid this by making inline views non-mergeable.

PS. I was checking performance on 12.2.

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  6uwd6j96gsn5u, child number 0
-------------------------------------
select t1.id id1,        t2.id id2,        t1.name,
listagg(t3.value, ', ') within group (order by t3.value) matched_str
from t t1 join (select distinct id, name from t) t2 on t1.id <> t2.id
and t1.name = t2.name left join t t3 on t2.id = t3.id and t2.name =
t3.name and t1.value = t3.value group by t1.id, t2.id, t1.name order by
t1.id, t2.id, t1.name
Plan hash value: 1952487454
------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |      1 |        |    450 |00:00:05.32 |    2102 |  12399 |  12399 |       |       |          |         |
|   1 |  SORT GROUP BY         |      |      1 |   1352 |    450 |00:00:05.32 |    2102 |  12399 |  12399 |    54M|  3557K|   97M (1)|   99328 |
|*  2 |   HASH JOIN RIGHT OUTER|      |      1 |   3735K|   4200K|00:00:01.61 |    2088 |      0 |      0 |    14M|  3934K|   14M (0)|         |
|   3 |    TABLE ACCESS FULL   | T    |      1 |    210K|    210K|00:00:00.01 |     695 |      0 |      0 |       |       |          |         |
|*  4 |    HASH JOIN           |      |      1 |   3735K|   4200K|00:00:00.46 |    1393 |      0 |      0 |  2171K|  2171K|  998K (0)|         |
|   5 |     VIEW               |      |      1 |     74 |     32 |00:00:00.01 |     695 |      0 |      0 |       |       |          |         |
|   6 |      HASH UNIQUE       |      |      1 |     74 |     32 |00:00:00.01 |     695 |      0 |      0 |  1970K|  1970K| 1328K (0)|         |
|   7 |       TABLE ACCESS FULL| T    |      1 |    210K|    210K|00:00:00.01 |     695 |      0 |      0 |       |       |          |         |
|   8 |     TABLE ACCESS FULL  | T    |      1 |    210K|    210K|00:00:00.01 |     695 |      0 |      0 |       |       |          |         |
------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
  2 - access("T2"."ID"="T3"."ID" AND "T2"."NAME"="T3"."NAME" AND "T1"."VALUE"="T3"."VALUE")
  4 - access("T1"."NAME"="T2"."NAME")
      filter("T1"."ID"<>"T2"."ID")

32 rows selected.

and

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  2aqpn1prb5r9c, child number 0
-------------------------------------
select ttt.*, matched_str from (select distinct t1.id id1, t2.id id2,
t1.name       from t t1 join t t2 on t1.id <> t2.id and t1.name =
t2.name) ttt      left join (select t1.id id1, t2.id id2, t1.name,
listagg(t1.value, ', ') within group (order by t1.value) matched_str
            from t t1 join t t2 on t1.name = t2.name and t1.value =
t2.value                 where t1.id <> t2.id                 group by
t1.id, t2.id, t1.name) tg          on ttt.id1 = tg.id1 and ttt.id2 =
tg.id2 and ttt.name = tg.name order by 1, 2, 3
Plan hash value: 127670182
---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                 |      1 |        |    450 |00:00:00.09 |    2783 |       |       |          |
|   1 |  SORT ORDER BY           |                 |      1 |   1316 |    450 |00:00:00.09 |    2783 | 33792 | 33792 |30720  (0)|
|*  2 |   HASH JOIN OUTER        |                 |      1 |   1316 |    450 |00:00:00.08 |    2783 |  1797K|  1797K| 1652K (0)|
|   3 |    VIEW                  |                 |      1 |   1316 |    450 |00:00:00.02 |    1393 |       |       |          |
|   4 |     HASH UNIQUE          |                 |      1 |   1316 |    450 |00:00:00.02 |    1393 |  1698K|  1698K| 1337K (0)|
|*  5 |      HASH JOIN           |                 |      1 |   1316 |    450 |00:00:00.02 |    1393 |  2171K|  2171K| 1187K (0)|
|   6 |       VIEW               | VW_DTP_A79DD41C |      1 |     74 |     32 |00:00:00.01 |     695 |       |       |          |
|   7 |        HASH UNIQUE       |                 |      1 |     74 |     32 |00:00:00.01 |     695 |  1970K|  1970K| 1327K (0)|
|   8 |         TABLE ACCESS FULL| T               |      1 |    210K|    210K|00:00:00.01 |     695 |       |       |          |
|   9 |       VIEW               | VW_DTP_96A7B19C |      1 |     74 |     32 |00:00:00.01 |     695 |       |       |          |
|  10 |        HASH UNIQUE       |                 |      1 |     74 |     32 |00:00:00.01 |     695 |  1970K|  1970K| 1329K (0)|
|  11 |         TABLE ACCESS FULL| T               |      1 |    210K|    210K|00:00:00.01 |     695 |       |       |          |
|  12 |    VIEW                  |                 |      1 |   1352 |     22 |00:00:00.06 |    1390 |       |       |          |
|  13 |     SORT GROUP BY        |                 |      1 |   1352 |     22 |00:00:00.06 |    1390 |  4096 |  4096 | 4096  (0)|
|* 14 |      HASH JOIN           |                 |      1 |    201K|     24 |00:00:00.06 |    1390 |    14M|  3934K|   15M (0)|
|  15 |       TABLE ACCESS FULL  | T               |      1 |    210K|    210K|00:00:00.01 |     695 |       |       |          |
|  16 |       TABLE ACCESS FULL  | T               |      1 |    210K|    210K|00:00:00.01 |     695 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
  2 - access("TTT"."ID1"="TG"."ID1" AND "TTT"."ID2"="TG"."ID2" AND "TTT"."NAME"="TG"."NAME")
  5 - access("ITEM_1"="ITEM_1")
      filter("ITEM_2"<>"ITEM_2")
 14 - access("T1"."NAME"="T2"."NAME" AND "T1"."VALUE"="T2"."VALUE")
      filter("T1"."ID"<>"T2"."ID")

44 rows selected.

Jonathan Lewis

The big feature of your data is visible in operations 6 and 7 of the first plan (and twice in the 2nd plan) where the A-Rows show that your DISTINCT (id, name) drops to 32 rows from an initial 210,000 - so the "N-squared" effect in your case would be a large N if you didn't do this distinct first.

It looks like the optimizer has been very clever in the 2nd plan with the large data size - doing something the demo data didn't do for me in 19.11 - the VW_DTP views are internally generated by “place distinct” transformations, so I think Oracle has transformed: “join then distinct” into “distinct then join” - and which is where the huge performance gain appears. If you could add outline and projection to the plan format options I'd be interested to see what they looked like. (I wasn't even able to force it with a place_distinct() hint.)

Regards

Jonathan Lewis

Alex.UA

Sure, let me attach the script as well as an output.

output.txt

test.txt

Jonathan Lewis

Thanks for that - it's solved my problem.

The view v$sql_hints is supposed to tell you the level at which a hint should be applied, viz: statement, query block, table or join. The entry for place_distinct() says it's a query block hint (which is how I was trying to use it) but your output shows that it is a table level hint, e.g.: PLACE_DISTINCT(@"SEL$64EAE176" "T1"@"SEL$2")

Regards

Jonathan Lewis

1 - 9

Added on Mar 19 2025

#sqlchallenge

9 comments

159 views

SQL & PL/SQL

Real life task such as the more joins in solution the faster it runs

Comments

Post Details