I'm hoping I can get some performance advice on the best approach to tackle my problem.
I've renamed things to hopefully make it a bit more understandable...
I have a set of records which keep track of a status at a given date for a range of points on a given road. Basically, I need the current bits to "shine through" eg
|-----| |----------| 1-jan-2008
|--| 5-jun-2008
|------| 1-jan-2009
|----| 12-feb-2011
|--|---|---|----|------|-----| (this is the "current" view of the data)
The table holding the 5 records above is essentially:
create table my_status(
my_status_id number,
status varchar2(50),
road_id number,
start_point number,
end_point number,
status_date date
);
so the 5 records would be
insert into my_status values(1, 'OPEN', 1, 20, 50, '01-jan-2008');
insert into my_status values(2, 'CLOSED', 1, 80, 150, '01-jan-2008');
insert into my_status values(3, 'OTHER', 1, 55, 85, '05-jun-2008');
insert into my_status values(4, 'OTHER', 1, 40, 90, '01-jan-2009');
insert into my_status values(5, 'OPEN', 1, 60, 100, '12-feb-2011');
The result required is to display the my_status_id that is current and the range that it applies to ie
current_from, current_to, my_status_id
20,39,1
40,59,4
60,100,5
101,150,2
151,200,0
so my_status_id of 3 doesn't show up as it has entirely been superseded by later records. Also note that 0-19 and 151-200 comes back as my_status_id = 0 which is just to show there is no record for that range.
This is just a very small set of the data as the calculation needs to be done for every road in the system. The roads are defined as follows:
create table road(
road_id number,
road_desc varchar2(100)
)
insert into road values (1,'Road ABC')
and also each road has a list of valid point:
create table road_bit(
road_id number,
start_pt number,
end_pt number
)
insert into road_bit values (1,0,200)
insert into road_bit values (1,250,500)
So basically, I need to go through every road_bit record and get back the my_status_id and range it applies to (and 0 when no current record as per above).
The brute force approach is to look at each pt between the start_pt and end_pt of the road_bit and find the current my_status_id. You can then group these up to get a start and end range for each my_status_id (ie when the my_status_id changes/breaks). This is quite intensive processing and considering the total number of pts that need to be calculated is around 350 million it's not very fast! It's obviously fine on a smaller scale but the query time is linear compared to the number of pts you query. I've tried various approaches from building in PLSQL with a table function and also straight SQL but they take far too long. It's likely the end solution will be run overnight as a materialized view so I'm not too worried if it takes a few hours, I just can't have it taking a few days:-)
The current my_status_id for a given pt can be found as follows:
SELECT nvl(max(s.my_status_id),0) my_status_id
FROM my_status s
WHERE s.road_id = :p_road_id
AND :pt between s.start_point and s.end_point
AND NVL(s.status_date,TO_DATE('01-JAN-0001','DD-MON-YYYY')) =
(SELECT MAX(NVL(z.status_date,TO_DATE('01-JAN-0001','DD-MON-YYYY')))
FROM my_status z
WHERE z.road_id = s.road_id
AND :pt between z.start_point and z.end_point )
I'm hoping there's a slightly smarter way to go about this which avoids essentially looping 350M times and running a costly query. I look forward to no doubt being baffled by some solutions!