Performance implications of inequality joins?
sf_jcatJan 30 2008 — edited Jan 31 2008Hello all,
Someone passed me a query that is "taking too long" and I wanted to run it by the board. Essentially it is:
select a.*, b.cust_level
from base_table a, level_table b
where a.start_dt >= b.stage_start_dt
and a.start_dt <= b.stage_end_dt
Table a has 8MM rows, table b has 300 (300 stages)..
This business has long lead times from when they acquire employees to when those employees start working for them, and the level table tracks a person's "stage" in this process in order to send them the correct emails for the next step. So for example, if they know you will start in 7 weeks, you are essentially flagged as "T-7 weeks" (but they ahve a name for each stage), then T-6 Weeks, etc, and you're also tracked long after starting (T +1 Week, etc). A person's stage can hence change from one day to the next as they move across stages.
Sorry for all the background, but wanted to explain why that query is the way it is. This process seems a little crazy to me, and I'd like to get them to redesign the tables so they are not updating it daily and shifting around, but the immediate question is:
Are inequality joins like this trouble? I know we could rewrite the above with a BETWEEN rather than the < and > but that is readability as much as anything. Are there specific hints we can pass the optimizer for joins like these?
Hopefully this isn't too complicated if the questions here are normally quick hitters, but any information you can provide would be helpful. I have been searching the net but can't find the "Aha!" answer...
Thanks