Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Sum Over Partition Query takes ages to run

976439Dec 19 2016 — edited Dec 19 2016

Hi, I was looking for some help in regards to an SQL query that I am trying to run in Hive and was looking for some advice to see if the query could be shortened or written differently to make it run quicker. When I have ran this (see below) against a dummy data set (1500 rows) in SQL Developer this runs farily quickly and gives me what I need:

SELECT

    P_AGENT,

    TRUNC(P_DATE) AS P_DATE,

    SUM(TOUCH_COUNT) AS TOUCH_COUNT,

    MIN(P_DATE) AS START_TIME,

    MAX(P_DATE) AS END_TIME,

    MAX(P_DATE) - MIN(P_DATE) AS TIME_TAKEN

FROM

    (

    SELECT

        T1.*,

        SUM(COUNT1) OVER (PARTITION BY P_AGENT, TRUNC(P_DATE) ORDER BY P_DATE) AS GRP,

        CASE 

            WHEN P_DATE - LAG(P_DATE) OVER (PARTITION BY P_AGENT, TRUNC(P_DATE) ORDER BY P_DATE) <= 1 / 48 

            THEN NULL

            ELSE 1

         END AS TOUCH_COUNT

    FROM

        (

          SELECT

            T1.*,

            (CASE WHEN P_DATE - LAG(P_DATE) OVER (PARTITION BY P_AGENT, TRUNC(P_DATE) ORDER BY P_DATE) <= 1/48 

             THEN 0 ELSE 1

            END) AS COUNT1

          FROM

            TABLE2 T1

        ) T1

    ) T1

GROUP BY

    P_AGENT, TRUNC(P_DATE), GRP

ORDER BY

    P_DATE

    ;

Unfortunately when I try to run the same query in Hive, the query takes ages to run and the max I have ran it for is 7 hours and I still dont have an output. The problem is that I am trying to run this against a table that has over 300 million rows hence the length of time it is taking.

I decided to break up the queries and put in the initial subquery into a table (dummy data shown below) and this was fine and the output I now have in a sub table with 347million rows (dummy data below only shows 10 rows)

CREATE TABLE

CREATE TABLE "TIME_ISSUES"

   (    "P_DATE" DATE,

    "P_AGENT" VARCHAR2(8 BYTE),

    "COUNT1" NUMBER(5,0)

   )

INSERT STATEMENTS

Insert into TIME_ISSUES (P_DATE,P_AGENT,COUNT1) values (to_date('05-APR-16 20:20:12','DD-MON-RR HH24:MI:SS'),'CLQRC0',1);

Insert into TIME_ISSUES (P_DATE,P_AGENT,COUNT1) values (to_date('07-APR-16 15:06:09','DD-MON-RR HH24:MI:SS'),'SMCXF3',1);

Insert into TIME_ISSUES (P_DATE,P_AGENT,COUNT1) values (to_date('08-APR-16 04:33:00','DD-MON-RR HH24:MI:SS'),'EAQGH1',1);

Insert into TIME_ISSUES (P_DATE,P_AGENT,COUNT1) values (to_date('08-APR-16 12:17:53','DD-MON-RR HH24:MI:SS'),'JMENJDS',1);

Insert into TIME_ISSUES (P_DATE,P_AGENT,COUNT1) values (to_date('09-APR-16 13:06:53','DD-MON-RR HH24:MI:SS'),'JMENJDS',1);

Insert into TIME_ISSUES (P_DATE,P_AGENT,COUNT1) values (to_date('11-APR-16 10:41:00','DD-MON-RR HH24:MI:SS'),'NUKXY3',1);

Insert into TIME_ISSUES (P_DATE,P_AGENT,COUNT1) values (to_date('12-APR-16 10:15:21','DD-MON-RR HH24:MI:SS'),'JMRJADS',1);

Insert into TIME_ISSUES (P_DATE,P_AGENT,COUNT1) values (to_date('12-APR-16 11:19:23','DD-MON-RR HH24:MI:SS'),'CLMXB1',1);

Insert into TIME_ISSUES (P_DATE,P_AGENT,COUNT1) values (to_date('15-APR-16 08:36:00','DD-MON-RR HH24:MI:SS'),'EMA0L1',1);

Insert into TIME_ISSUES (P_DATE,P_AGENT,COUNT1) values (to_date('22-APR-16 16:22:00','DD-MON-RR HH24:MI:SS'),'EADBM1',1);

Unfortunately the initial table has 49 different columns but I have selected the 2 columns that I am using in the example above.

I have found that when running the second sub query, it looks the problem lies in the statement:

SUM(COUNT1) OVER (PARTITION BY P_AGENT, TRUNC(P_DATE) ORDER BY P_DATE) AS GRP,

The full query that I am running is shown here:

SELECT T1.*,

  SUM(COUNT1) OVER (PARTITION BY P_AGENT, TO_DATE(P_DATE) ORDER BY P_DATE) AS GRP,

  CASE

    WHEN P_DATE - LAG(P_DATE) over (PARTITION BY P_RECR, TO_DATE(P_DATE) ORDER BY P_DATE) <= 1 / 48

    THEN NULL

    ELSE 1

  END AS TOUCH_COUNT

FROM TIME_ISSUES T1 ;

but this takes to run and wanted to some advice to see if there is anything different that i could try as i am stuck on how to resolve this now.

Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Jan 16 2017
Added on Dec 19 2016
7 comments
1,634 views