I am attempting to calculate a rolling average over a specific time period, but I can't quite figure out how to do that. For this particular problem, I'm also stuck dealing with an 8i database.
Here's some sample data for the problem:
CREATE TABLE my_data
( order_no CHAR(10)
, seq_nbr CHAR(4)
, area_id CHAR(4)
, start_date DATE
, unit_time NUMBER(7,2)
);
INSERT INTO my_data VALUES ('0000567890','0010','A001',TO_DATE('05/01/2000','mm/dd/yyyy'),0.34);
INSERT INTO my_data VALUES ('0000567890','0020','A001',TO_DATE('05/02/2000','mm/dd/yyyy'),0.78);
INSERT INTO my_data VALUES ('0000567890','0030','A002',TO_DATE('05/03/2000','mm/dd/yyyy'),0.91);
INSERT INTO my_data VALUES ('0000567890','0040','A003',TO_DATE('05/03/2000','mm/dd/yyyy'),0.27);
INSERT INTO my_data VALUES ('0000123456','0010','A001',TO_DATE('04/01/2001','mm/dd/yyyy'),0.39);
INSERT INTO my_data VALUES ('0000123456','0020','A001',TO_DATE('04/02/2001','mm/dd/yyyy'),0.98);
INSERT INTO my_data VALUES ('0000123456','0030','A002',TO_DATE('04/03/2001','mm/dd/yyyy'),0.77);
INSERT INTO my_data VALUES ('0000123456','0040','A003',TO_DATE('04/03/2001','mm/dd/yyyy'),0.28);
INSERT INTO my_data VALUES ('0000123123','0010','A001',TO_DATE('12/01/2001','mm/dd/yyyy'),0.31);
INSERT INTO my_data VALUES ('0000123123','0020','A001',TO_DATE('12/02/2001','mm/dd/yyyy'),0.86);
INSERT INTO my_data VALUES ('0000123123','0030','A002',TO_DATE('12/03/2001','mm/dd/yyyy'),0.82);
INSERT INTO my_data VALUES ('0000123123','0040','A003',TO_DATE('12/03/2001','mm/dd/yyyy'),0.23);
INSERT INTO my_data VALUES ('0000111111','0010','A001',TO_DATE('06/01/2002','mm/dd/yyyy'),0.29);
INSERT INTO my_data VALUES ('0000111111','0020','A001',TO_DATE('06/02/2002','mm/dd/yyyy'),0.84);
INSERT INTO my_data VALUES ('0000111111','0030','A002',TO_DATE('06/03/2002','mm/dd/yyyy'),0.78);
INSERT INTO my_data VALUES ('0000111111','0040','A003',TO_DATE('06/03/2002','mm/dd/yyyy'),0.26);
INSERT INTO my_data VALUES ('0000654321','0010','A001',TO_DATE('05/01/2003','mm/dd/yyyy'),0.28);
INSERT INTO my_data VALUES ('0000654321','0020','A001',TO_DATE('05/02/2003','mm/dd/yyyy'),0.88);
INSERT INTO my_data VALUES ('0000654321','0030','A002',TO_DATE('05/03/2003','mm/dd/yyyy'),0.75);
INSERT INTO my_data VALUES ('0000654321','0040','A003',TO_DATE('05/03/2003','mm/dd/yyyy'),0.25);
This is a sample of the real data, which has around 50K rows, and are the results of a query, not a table. Additionally, the real data also has a part number associated with each order, and the final result will be partitioned by part number and area id. This sample basically represents something you might see for a single part number. Also, the spacing of these records in time is not predictable. You could have a single year that has 20 orders, and another year (or period of years) that has none.
Here's where I'm at right now in the problem. For the example, let's assume I want to calculate a 2-year rolling average, (an average for the 2 years previous to the date of the current row). For my real data, I will probably want to be able to vary the time period (different time periods in different queries) over which the rolling average is calculated.
I thought that using the AVERAGE analytical function with a window clause might be the way to go, but I'm not exactly sure how to use the window clause. I may even be completely mis-interpreting what its use is, but hopefully it should still show what I'm after.
SELECT area_id
, AVG(tot_area_unit_hrs)
OVER (
PARTITION BY area_id
ORDER BY min_start
ROWS BETWEEN --2 years prior to min_start of current row
AND CURRENT ROW
)
, --something to indicate the time period the average is over AS time_period
FROM (
SELECT order_no
, area_id
, MIN(start_date) AS min_start
, SUM(unit_time) AS tot_area_unit_hrs
FROM my_data
GROUP BY order_no
, area_id
)
ORDER BY area_id
, time_period
As you can see from the query above, I want to calculate the sum of the unit_time for each order/area_id combo, then get the rolling average of that total time. (I am calculating the average total time a product spends in each area_id per order).
I want to see results like something along these lines... I don't really care how the time period is identified, as long as I can sort by it in chronological order, and it tells me what period of time it covers. The way I show it with period_start and period_end in the sample results below is just one way I thought of to do that.
area_id period_start period_end avg_unit_time
----------------------------------------------------
A001 4/30/1998 5/1/2000 1.120
A001 3/31/1999 4/1/2001 1.245
A001 11/30/1999 12/1/2001 1.220
A001 5/31/2000 6/1/2002 1.223
A001 4/30/2001 5/1/2003 1.153
A002 5/2/1998 5/3/2000 0.910
A002 4/2/1999 4/3/2001 0.840
A002 12/2/1999 12/3/2001 0.833
A002 6/2/2000 6/3/2002 0.790
A002 5/2/2001 5/3/2003 0.783
A003 5/2/1998 5/3/2000 0.270
A003 4/2/1999 4/3/2001 0.275
A003 12/2/1999 12/3/2001 0.260
A003 6/2/2000 6/3/2002 0.257
A003 5/2/2001 5/3/2003 0.247
Any suggestions?