I have table with outliers,
ID COL1 COL1_Z_SCORE COL2 COL2_Z_SCORE
0 1 0 12 -0.03
1 100 2.22 17 -0.01
2 1 0 31 0.04
3 1 0 530 2.01
4 1 0 22 0.01
5 1 0 32 0.05
6 1 0 -600 -2.44
7 1 0 43 0.09
8 1 0 43 0.09
9 1 0 43 0.09
10 -100 -2.25 43 0.09
Thanks for Frank's answer https://community.oracle.com/message/15449126#15449126, I can use the following method to detect outliers by Z-Score method,
(1) compute the Z-Score by each column
(2) if the absolute value of computed Z-Score > 2.2 ,consider it as outlier and set it NULL
Now, I want to correct outliers, if the value of computed Z-Score > 2.2, set it as two standard deviations, and if the value of computed Z-Score < -2.2, set it as negative two standard deviations
My expected result is as follow, 89.45 and 508.71 is the standard deviation of each columns.
ID COL1 COL1_Z_SCORE COL2 COL2_Z_SCORE
0 1 0 12 -0.03
1 89.45 2.22 17 -0.01
2 1 0 31 0.04
3 1 0 508.71 2.01
4 1 0 22 0.01
5 1 0 32 0.05
6 1 0 -508.71 -2.44
7 1 0 43 0.09
8 1 0 43 0.09
9 1 0 43 0.09
10 -89.45 -2.25 43 0.09
My expected answer can be modified based on Frank's answer as follow, since I need to do this for several columns (not just two). Thanks very much.
WITH UNPIVOTED_DATA AS
(SELECT ID,
COL,
VAL,
ABS((VAL - AVG(VAL) OVER(PARTITION BY COL)) / STDDEV(VAL)
OVER(PARTITION BY COL)) AS ABS_Z
FROM MYDATA UNPIVOT(VAL FOR COL IN(COL1, COL2)))
SELECT *
FROM UNPIVOTED_DATA
PIVOT (MIN(CASE WHEN ABS_Z <= 2.2 THEN VAL END) FOR COL IN('COL1' AS COL1,
'COL2' AS COL2))
ORDER BY ID;
create table MYDATA (ID integer, COL1 integer, COL2 integer);
INSERT INTO MYDATA (ID, COL1, COL2) VALUES (0, 1, 12);
INSERT INTO MYDATA (ID, COL1, COL2) VALUES (1, 100, 17);
INSERT INTO MYDATA (ID, COL1, COL2) VALUES (2, 1, 31);
INSERT INTO MYDATA (ID, COL1, COL2) VALUES (3, 1, 530);
INSERT INTO MYDATA (ID, COL1, COL2) VALUES (4, 1, 22);
INSERT INTO MYDATA (ID, COL1, COL2) VALUES (5, 1, 32);
INSERT INTO MYDATA (ID, COL1, COL2) VALUES (6, 1, -600);
INSERT INTO MYDATA (ID, COL1, COL2) VALUES (7, 1, 43);
INSERT INTO MYDATA (ID, COL1, COL2) VALUES (8, 1, 43);
INSERT INTO MYDATA (ID, COL1, COL2) VALUES (9, 1, 43);
INSERT INTO MYDATA (ID, COL1, COL2) VALUES (10, -100, 43);