Skip to Main Content

Data Science & Machine Learning

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Oracle Data Mining - How to use PREDICTION function with a regression model

395527Mar 25 2011 — edited Sep 16 2011
I've been searching this site for Data Mining Q&A specifically related to prediction function and I wasn't able to find something useful on this topic. So I hope that posting it as a new thread will get useful answers for a beginner in oracle data mining.

So here is my issue with prediction function:

Given a table with 17 weeks of sales for a given product, I would like to do a forecast to predict the sales for the week 18th.

For that let's start preparing the necessary objects and data:

CREATE TABLE T_SALES
(
PURCHASE_WEEK DATE,
WEEK NUMBER,
SALES NUMBER
)

SET DEFINE OFF;
Insert into T_SALES
(PURCHASE_WEEK, WEEK, SALES)
Values
(TO_DATE('11/27/2010 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 1, 55488);
Insert into T_SALES
(PURCHASE_WEEK, WEEK, SALES)
Values
(TO_DATE('12/04/2010 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 2, 78336);
Insert into T_SALES
(PURCHASE_WEEK, WEEK, SALES)
Values
(TO_DATE('12/11/2010 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 3, 77248);
Insert into T_SALES
(PURCHASE_WEEK, WEEK, SALES)
Values
(TO_DATE('12/18/2010 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 4, 106624);
Insert into T_SALES
(PURCHASE_WEEK, WEEK, SALES)
Values
(TO_DATE('12/25/2010 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 5, 104448);
Insert into T_SALES
(PURCHASE_WEEK, WEEK, SALES)
Values
(TO_DATE('01/01/2011 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 6, 90304);
Insert into T_SALES
(PURCHASE_WEEK, WEEK, SALES)
Values
(TO_DATE('01/08/2011 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 7, 44608);
Insert into T_SALES
(PURCHASE_WEEK, WEEK, SALES)
Values
(TO_DATE('01/15/2011 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 8, 95744);
Insert into T_SALES
(PURCHASE_WEEK, WEEK, SALES)
Values
(TO_DATE('01/22/2011 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 9, 129472);
Insert into T_SALES
(PURCHASE_WEEK, WEEK, SALES)
Values
(TO_DATE('01/29/2011 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 10, 110976);
Insert into T_SALES
(PURCHASE_WEEK, WEEK, SALES)
Values
(TO_DATE('02/05/2011 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 11, 139264);
Insert into T_SALES
(PURCHASE_WEEK, WEEK, SALES)
Values
(TO_DATE('02/12/2011 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 12, 87040);
Insert into T_SALES
(PURCHASE_WEEK, WEEK, SALES)
Values
(TO_DATE('02/19/2011 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 13, 47872);
Insert into T_SALES
(PURCHASE_WEEK, WEEK, SALES)
Values
(TO_DATE('02/26/2011 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 14, 120768);
Insert into T_SALES
(PURCHASE_WEEK, WEEK, SALES)
Values
(TO_DATE('03/05/2011 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 15, 98463.65);
Insert into T_SALES
(PURCHASE_WEEK, WEEK, SALES)
Values
(TO_DATE('03/12/2011 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 16, 67455.84);
Insert into T_SALES
(PURCHASE_WEEK, WEEK, SALES)
Values
(TO_DATE('3/19/2011 23:59:59', 'MM/DD/YYYY HH24:MI:SS'), 17, 100095.66);

COMMIT;


There are a lot of linear regression models and approaches for sales forecast out on the market, however I will focus on what oracle 11g offers i.e. package SYS.DBMS_DATA_MINING to create a model using regression as mining function and then, once the model is created, to apply prediction function on the model.

Therefore I'll have to go through few steps:

i) normalization of data

CREATE OR REPLACE VIEW t_sales_norm AS
SELECT week,
sales,
(sales - 91423.95)/27238.3693126778 sales_norm
FROM t_sales;

whereas the numerical values are the mean and the standard deviation:

select avg(sales) from t_sales;
91423.95

select stddev(sales) from t_sales;
27238.3693126778

ii) auto-correlation. For the sake of simplicity, I will safely assume that there is no auto-correlation (no repetitive pattern in sales among the weeks). Therefore to define the lag data I will consider the whole set:

CREATE OR REPLACE VIEW t_sales_lag AS
SELECT a.*
FROM (SELECT week,
sales,
LAG(sales_norm, 1) OVER (ORDER BY week) L1,
LAG(sales_norm, 2) OVER (ORDER BY week) L2,
LAG(sales_norm, 3) OVER (ORDER BY week) L3,
LAG(sales_norm, 4) OVER (ORDER BY week) L4,
LAG(sales_norm, 5) OVER (ORDER BY week) L5,
LAG(sales_norm, 6) OVER (ORDER BY week) L6,
LAG(sales_norm, 7) OVER (ORDER BY week) L7,
LAG(sales_norm, 8) OVER (ORDER BY week) L8,
LAG(sales_norm, 9) OVER (ORDER BY week) L9,
LAG(sales_norm, 10) OVER (ORDER BY week) L10,
LAG(sales_norm, 11) OVER (ORDER BY week) L11,
LAG(sales_norm, 12) OVER (ORDER BY week) L12,
LAG(sales_norm, 13) OVER (ORDER BY week) L13,
LAG(sales_norm, 14) OVER (ORDER BY week) L14,
LAG(sales_norm, 15) OVER (ORDER BY week) L15,
LAG(sales_norm, 16) OVER (ORDER BY week) L16,
LAG(sales_norm, 17) OVER (ORDER BY week) L17
FROM t_sales_norm) a;

iii) choosing the training data. Again, I will choose the whole set of 17 weeks, as for this discussion in not relevant how big should be the set of training data.

CREATE OR REPLACE VIEW t_sales_train AS
SELECT week, sales,
L1, L2, L3, L4, L5, L6, L7, L8, L9, L10,
L11, L12, L13, L14, L15, L16, L17
FROM t_sales_lag a
WHERE week >= 1 AND week <= 17;

iv) build the model

-- exec SYS.DBMS_DATA_MINING.DROP_MODEL('t_SVM');

BEGIN
sys.DBMS_DATA_MINING.CREATE_MODEL( model_name => 't_SVM',
mining_function => dbms_data_mining.regression,
data_table_name => 't_sales_train',
case_id_column_name => 'week',
target_column_name => 'sales');
END;
/


v) finally, where I am confused is applying the prediction function against this model and making sense of the results.
On a search on Google I found 2 ways of applying this function to my case.

One way is the following:

SELECT week, sales,
PREDICTION(t_SVM USING
LAG(sales,1) OVER (ORDER BY week) as l1,
LAG(sales,2) OVER (ORDER BY week) as l2,
LAG(sales,3) OVER (ORDER BY week) as l3,
LAG(sales,4) OVER (ORDER BY week) as l4,
LAG(sales,5) OVER (ORDER BY week) as l5,
LAG(sales,6) OVER (ORDER BY week) as l6,
LAG(sales,7) OVER (ORDER BY week) as l7,
LAG(sales,8) OVER (ORDER BY week) as l8,
LAG(sales,9) OVER (ORDER BY week) as l9,
LAG(sales,10) OVER (ORDER BY week) as l10,
LAG(sales,11) OVER (ORDER BY week) as l11,
LAG(sales,12) OVER (ORDER BY week) as l12,
LAG(sales,13) OVER (ORDER BY week) as l13,
LAG(sales,14) OVER (ORDER BY week) as l14,
LAG(sales,15) OVER (ORDER BY week) as l15,
LAG(sales,16) OVER (ORDER BY week) as l16,
LAG(sales,17) OVER (ORDER BY week) as l17
) pred
FROM t_sales a;


WEEK, SALES, PREDICTION
1, 55488, 68861.084076412
2, 78336, 104816.995823913
3, 77248, 104816.995823913
4, 106624, 104816.995823913
...............

As you can see for the first row there is a value of 68861.084 and for the rest of 16 values is always one and the same 104816.995.

Question: where is my week 18 prediction ? or maybe I should say which one is it ?

Another way of using prediction even more confusing is against the lag table:


SELECT week, sales,
PREDICTION(t_svm USING a.*) pred
FROM t_sales_lag a;


WEEK, SALES, PREDICTION
1, 55488, 68861.084076412
2, 78336, 75512.3642096908
3, 77248, 85711.5003385927
4, 106624, 98160.5009687461
.............

Each row out of 17, its own 'prediction' result.
Same question: which one is my week 18th prediction ?


Thank you very much for all help that you can provide on this matter.
It is as always highly appreciated.
Serge F.
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Oct 14 2011
Added on Mar 25 2011
3 comments
704 views