SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

SQL Query (challenge)

483527Nov 27 2007 — edited Nov 27 2007

Hello,

I have 2 tables of events E1 and E2

E1: (Time, Event), E2: (Time, Event)

Where the columns Time in both tables are ordered.

Ex. 
   E1: ((1, a) (2, b) (4, d) (6, c))
   E2: ((2, x) (3, y) (6, z))

To find the events of both tables at the same time it is obvious to do & join between E1 and E2

Q1 -> select e1.Time, e1.Event, e2.Event from E1 e1, E2 e2 where e1.Time=e2.Time;

The result of the query is:

((2, b, x) (6, c, z))

Given that there is no indexes for this tables, an efficient execution plan can be a hash join (under conditions mentioned in Oracle Database Performance Tuning Guide Ch 14).

Now, the hash join suffers from locality problem if the hash table is large and does not fit in memory; it may happen that one block of data is read in memory and swaped out frequently.


Given that the Time columns are sorted is ascending order, I find the following algorithm, known idea in the literature, apropriate to this problem; The algorithm is in pseudocode close to pl/sql, for simplicity (I home the still is clear):

-- start algorithm
open cursors for e1 and e2
loop 
  if e1.Time = e2.Time then
     pipe row (e1.Time, e1.Event, e2.Event);
     fetch next e1 record
     exit when notfound
     fetch next e2 record
      exit when notfound
  else
     if e1.Time < e2.Time then 
        fetch next e1 record
        exit when notfound
     else 
        fetch next e2 record
        exit when notfound
     end if;
  end if;
end loop
-- end algorithm

As you can see the algorithm does not suffer from locality issue since it iterates sequentially over the arrays.

Now the problem: The algorithm shown below hints the use of pipelined function to implement it in pl/sql, but it is slow compared to hash join in the implicit cursor of the query shown above (Q1).

Is there an implicit SQL query to implement this algorithm? The objective is to beat the hash join of the query (Q1), so queries that use sorting are not accepted. 

A difficulty I foound is that the explicit cursor are much slower that implict ones (SQL queries) 

Example: for a large table (2.5 million records) 

create table mytable (x number);

declare
....
begin
open c for 'select 1 from mytable';
fetch c bulk collect into l_data;
close c;
dbms_output.put_line('couont = '||l_data.count);
end;

is 5 times slower then

select count(*) from mytable;


I do not understand why it should be the case, I read that this may be explained because pl/sql is interpreted, but I think this does not explain the whole issue. May be because the fetch copies data from one space to your space and this takes a long time.

Locked Post

New comments cannot be posted to this locked post.

Locked on Dec 25 2007

Added on Nov 27 2007

1 comment

335 views