I hope this post is not so long that no one reads it, I tend to over explain myself.
Hi, I'm trying to use LAG function in a sub query to find the date of change for a specific column but the internal business Oracle tool that I use is having a problem processing so I'm trying to find a more efficient method. The tool that i'm restricted to using is oracle based, but I'm sorry I do not know what version and I test all of my SQL in AquaDataStudio before pasting the SQL to my other tool. My query runs in about 2 minutes in ADS but in my experience it can take as long as 30 minutes in my other tool for other queries that take 2 minutes in ADS.
The audit table that I'm querying contains ~30M rows with ~50 columns and my final results contain < 7k rows, so I would like to figure a way to skinny down my subquery.
right now I'm using the LAG function in my subquery and then my main query contains the WHERE clause that limits of changes in the past 5 years:
WHERE prev_group != new_group AND change_date > CURRENT_DATE - 5 YEARS;
For this question I have removed irrelevant tables, columns. Here is my query.
SELECT *
FROM (
SELECT DISTINCT
TDAD.DAD_DRG_NDC_ID ID
,LAG(TDAD.DAD_GPI_CD) OVER (PARTITION BY TDAD.DAD_DRG_NDC_ID
ORDER BY TDAD.DAD_LAD_TS) prev_group
,TDAD.DAD_GPI_CD new_group
,CAST (TDAD.DAD_LAD_TS AS DATE) change_date
FROM DVV.TDAD_DRUG_AUDIT TDAD
WHERE TDAD.DAD_DRG_NDC_ID = '2144401' --THIS LINE IS ONLY HERE FOR THE EXAMPLE
)A
WHERE prev_group != new_group
AND change_date > CURRENT_DATE - 5 YEARS;
I'm using a specific ID as an example and normally this is not limited to a specific ID. This data is accurate.

If I look at the results from just my subquery, this is what it looks like and the highlighted record is the one i'm expecting to see in my final results (with a change_date = 10/2/2018).

But if I try to limit my subquery to the past 5 years, this is what I get and I think contains an additional error because row 2 does not seem accurate, but what I'm concerned with is that it does not produce the accurate 10/2/2018 record (as seen above).
SELECT DISTINCT
TDAD.DAD_DRG_NDC_ID ID
,LAG(TDAD.DAD_GPI_CD) OVER (PARTITION BY TDAD.DAD_DRG_NDC_ID
ORDER BY TDAD.DAD_LAD_TS) prev_group
,TDAD.DAD_GPI_CD new_group
,CAST (TDAD.DAD_LAD_TS AS DATE) change_date
FROM DVV.TDAD_DRUG_AUDIT TDAD
WHERE TDAD.DAD_DRG_NDC_ID = '2144401' --THIS LINE IS ONLY HERE FOR THE EXAMPLE
AND TDAD.DAD_LAD_TS > CURRENT_DATE - 5 YEARS

And then if I rerun the subquery excluding the LAG function and the date exclusion this is what I see:

So it seems the WHERE in my subquery is keying on the CHANGE_DATE from the previous record (12/8/2017).
I hope this makes sense and I hope someone can follow this. Thanks.