Hello,
I have 2 tables of events E1 and E2
E1: (Time, Event), E2: (Time, Event)
Where the columns Time in both tables are ordered.
Ex.
E1: ((1, a) (2, b) (4, d) (6, c))
E2: ((2, x) (3, y) (6, z))
To find the events of both tables at the same time it is obvious to do & join between E1 and E2
Q1 -> select e1.Time, e1.Event, e2.Event from E1 e1, E2 e2 where e1.Time=e2.Time;
The result of the query is:
((2, b, x) (6, c, z))
Given that there is no indexes for this tables, an efficient execution plan can be a hash join (under conditions mentioned in Oracle Database Performance Tuning Guide Ch 14).
Now, the hash join suffers from locality problem if the hash table is large and does not fit in memory; it may happen that one block of data is read in memory and swaped out frequently.
Given that the Time columns are sorted is ascending order, I find the following algorithm, known idea in the literature, apropriate to this problem; The algorithm is in pseudocode close to pl/sql, for simplicity (I home the still is clear):
-- start algorithm
open cursors for e1 and e2
loop
if e1.Time = e2.Time then
pipe row (e1.Time, e1.Event, e2.Event);
fetch next e1 record
exit when notfound
fetch next e2 record
exit when notfound
else
if e1.Time < e2.Time then
fetch next e1 record
exit when notfound
else
fetch next e2 record
exit when notfound
end if;
end if;
end loop
-- end algorithm
As you can see the algorithm does not suffer from locality issue since it iterates sequentially over the arrays.
Now the problem: The algorithm shown below hints the use of pipelined function to implement it in pl/sql, but it is slow compared to hash join in the implicit cursor of the query shown above (Q1).
Is there an implicit SQL query to implement this algorithm? The objective is to beat the hash join of the query (Q1), so queries that use sorting are not accepted.
A difficulty I foound is that the explicit cursor are much slower that implict ones (SQL queries)
Example: for a large table (2.5 million records)
create table mytable (x number);
declare
....
begin
open c for 'select 1 from mytable';
fetch c bulk collect into l_data;
close c;
dbms_output.put_line('couont = '||l_data.count);
end;
is 5 times slower then
select count(*) from mytable;
I do not understand why it should be the case, I read that this may be explained because pl/sql is interpreted, but I think this does not explain the whole issue. May be because the fetch copies data from one space to your space and this takes a long time.