USING DISTINCT OR GROUP BY WHEN PRIMARY KEY IN CLAUSE
720975Sep 4 2009 — edited Sep 22 2009I WANT A WAY TO EFFICIENTLY PERFORM AGGREGATE FUNCTIONS ON A CHILD TABLE JOINED WITH THE PARENT TABLE AND RETURN ALL COLUMNS IN THE PARENT TABLE AND THE CALCULATED COLUMNS FROM THE CHILD TABLE.
THE PROBLEM IS THAT WHEN I USE GROUP BY ON ALL THE FIELDS IN THE PARENT TABLE, INCLUDING THE PARENT TABLE, I THINK ORACLE TESTS ALL COLUMNS IN THE GROUP BY CLAUSE WHEN IT SHOULD REALIZE THAT THE PRIMARY KEY MAKES ALL ROWS IMPLICITLY DISTINCT.
I CREATED TWO TEST TABLES AND RRAN SOME QUERIES TO TEST THE TIMES.
HERE ARE THE TABLES:
CREATE TABLE A_PARENT_TABLES
( PT_PARENT_TABLE_ID NUMBER(20,0) NOT NULL ENABLE,
PT_GRP_FEILD1 NUMBER(20,0) NOT NULL ENABLE,
PT_GRP_FEILD2 NUMBER(20,0) NOT NULL ENABLE,
PT_GRP_FEILD3 NUMBER(20,0) NOT NULL ENABLE,
PT_GRP_FEILD4 NUMBER(20,0) NOT NULL ENABLE,
PT_GRP_FEILD5 NUMBER(20,0) NOT NULL ENABLE,
PT_GRP_FEILD6 NUMBER(20,0) NOT NULL ENABLE,
PT_GRP_FEILD7 NUMBER(20,0) NOT NULL ENABLE,
PT_GRP_FEILD8 NUMBER(20,0) NOT NULL ENABLE,
PT_GRP_FEILD9 NUMBER(20,0) NOT NULL ENABLE,
CONSTRAINT A_PARENT_TABLES_PK PRIMARY KEY (PT_PARENT_TABLE_ID) ENABLE
);
CREATE TABLE A_CHILD_TABLES
( CT_CHILD_TABLE_ID NUMBER(20,0) NOT NULL ENABLE,
CT_PARENT_TABLE NUMBER(20,0) NOT NULL ENABLE,
CT_VALUE_TO_SUM NUMBER(20,0) NOT NULL ENABLE,
CONSTRAINT A_CHILD_TABLES_PK PRIMARY KEY (CT_CHILD_TABLE_ID) ENABLE,
CONSTRAINT CT_PARENT_TABLE_FK FOREIGN KEY (CT_PARENT_TABLE)
REFERENCES A_PARENT_TABLES (PT_PARENT_TABLE_ID) ON DELETE CASCADE ENABLE
);
I THEN INSERTED 20,000 RECORDS INTO TABLE A_PARENT_TABLES AND 60,000 INTO A_CHILD_TABLES (3 FOR EACH PARENT)
I THEN RAN THE FOLLOWING QUERIES:
--ALL FIELDS GROUGPED
SELECT
PT_PARENT_TABLE_ID,
PT_GRP_FEILD1,
PT_GRP_FEILD2,
PT_GRP_FEILD3,
PT_GRP_FEILD4,
PT_GRP_FEILD5,
PT_GRP_FEILD6,
PT_GRP_FEILD7,
PT_GRP_FEILD8,
PT_GRP_FEILD9,
SUM(CT_VALUE_TO_SUM) AS SUMED_VALUES
FROM
A_PARENT_TABLES,
A_CHILD_TABLES
WHERE
PT_PARENT_TABLE_ID = CT_PARENT_TABLE;
GROUP BY
PT_PARENT_TABLE_ID,
PT_GRP_FEILD1,
PT_GRP_FEILD2,
PT_GRP_FEILD3,
PT_GRP_FEILD4,
PT_GRP_FEILD5,
PT_GRP_FEILD6,
PT_GRP_FEILD7,
PT_GRP_FEILD8,
PT_GRP_FEILD9;
TOOK 2.42 seconds TO EXECUTE
--SINGLE FIELDS GROUPED
SELECT
PT_PARENT_TABLE_ID,
SUM(CT_VALUE_TO_SUM) AS SUMED_VALUES
FROM
A_PARENT_TABLES,
A_CHILD_TABLES
WHERE
PT_PARENT_TABLE_ID = CT_PARENT_TABLE
GROUP BY
PT_PARENT_TABLE_ID;
TOOK 0.71 seconds TO EXECUTE
WHY DOESNT THE EXECUTION OF THE FIRST QUERY REALIZE THAT HAVING PT_PARENT_TABLE_ID (THE PRIMARY KEY) IN THE GROUP BY CLAUSE MEANS NONE OF THE OTHER COLUMNS NEED TO BE CHECKED?
HOW CAN I ACCOMPLISH THE EQUVILENT OF THE FIRST QUERY IN A MORE EFFICIENT WAY?