Just wanted to share one interesting case.
The task is for each ID and name get matching names from other IDs and show comma separated list of matching values. If there are no matching values then return NULL.
For below sample data
drop table t;
create table t(id,name,value) as
(
select 1, 'A1', cast('V11' as varchar2(100)) from dual
union all select 1, 'A1', 'V12' from dual
union all select 1, 'A2', 'V21' from dual
union all select 2, 'A1', 'V11' from dual
union all select 2, 'A2', 'V21' from dual
union all select 2, 'A3', 'V' from dual
union all select 3, 'A1', 'V11' from dual
union all select 3, 'A1', 'V12' from dual
union all select 3, 'A2', 'V21' from dual
union all select 3, 'A3', 'V31' from dual
union all select 4, 'A3', 'QQ' from dual
union all select 4, 'A3', 'Q' from dual
union all select 4, 'A2', 'V21' from dual
union all select 5, 'A3', 'Q' from dual
);
The result is following
SQL> select t1.id id1,
2 t2.id id2,
3 t1.name,
4 listagg(decode(t1.value, t2.value, t1.value), ', ') within group (order by t1.value) matched_str
5 from t t1 join t t2 on t1.id <> t2.id and t1.name = t2.name
6 group by t1.id,
7 t2.id,
8 t1.name
9 order by 1, 2, 3;
ID1 ID2 NA MATCHED_STR
1 2 A1 V11
1 2 A2 V21
1 3 A1 V11, V12
1 3 A2 V21
1 4 A2 V21
2 1 A1 V11
2 1 A2 V21
2 3 A1 V11
2 3 A2 V21
2 3 A3
2 4 A2 V21
2 4 A3
2 5 A3
3 1 A1 V11, V12
3 1 A2 V21
3 2 A1 V11
3 2 A2 V21
3 2 A3
3 4 A2 V21
3 4 A3
3 5 A3
4 1 A2 V21
4 2 A2 V21
4 2 A3
4 3 A2 V21
4 3 A3
4 5 A3 Q
5 2 A3
5 3 A3
5 4 A3 Q
30 rows selected.
Even though solution returns correct result and has only one join, it is extremely inefficient because it is missing equality predicate by value hence cardinality blows up. Values are eventually compared in the decode function.
The most efficient solution I found so far is
select ttt.*, matched_str
from (select distinct t1.id id1, t2.id id2, t1.name
from t t1 join t t2 on t1.id <> t2.id and t1.name = t2.name) ttt
left join (select t1.id id1, t2.id id2, t1.name, listagg(t1.value, ', ') within group (order by t1.value) matched_str
from t t1 join t t2 on t1.name = t2.name and t1.value = t2.value
where t1.id <> t2.id
group by t1.id, t2.id, t1.name) tg
on ttt.id1 = tg.id1 and ttt.id2 = tg.id2 and ttt.name = tg.name
order by 1, 2, 3
In this case we have two inline views - the one to get all necessary combinations of matching names and the other one where we also take values into account. Once we have these two datasets the final outer join gives what we need. All is good but solution has 3 join which bring the question - can we reduce the number of joins without sacrifying the performance?
Indeed, we can solve this with 2 joins
select t1.id id1,
t2.id id2,
t1.name,
listagg(t3.value, ', ') within group (order by t3.value) matched_str
from t t1
join (select distinct id, name from t) t2 on t1.id <> t2.id and t1.name = t2.name
left join t t3 on t2.id = t3.id and t2.name = t3.name and t1.value = t3.value
group by t1.id, t2.id, t1.name
order by t1.id, t2.id, t1.name;
On a generated data this solution is orders of magnitude faster than the original one and at the same time it is orders of magnitude slower than the solution with 3 joins (plans with actual cardinalities explain why it is so).
I'm positing this because I'm curious if there is anything I might have missed and perhaps there are other solutions with the same or less number of joins and have better performance.
Also I think this is a good example which demonstrates that less joins does not always mean better perfoemance.
Thank you