SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Optimizing queries, execution plans, run times

mathguyAug 3 2016 — edited Aug 3 2016

Hi all,

This little experiment was incited by a recent question on this forum: Group by To follow my comments below it is not necessary to read (or even know about) that thread.

I will format my code in Courier New to make it easier for any of you who want to copy and paste to do so. (Is it really not possible to have SQL formatting that doesn't copy row numbers when copying from a post? ... rant #1).

First I created a table with 1 million rows, with an id column (not used for anything) and random dates between sysdate and sysdate + 2 hours, like so:

create table ttt (n, dt) as 
select level, sysdate + dbms_random.value(0, 1/12)
from   dual
connect by level <= 1000000;

Then I ran the following two queries (which are logically equivalent). They both group the data in ttt by second (which in this case really means by dt) and produce a count for each group. I compared explain plans (they are identical) and run times (very different). I wrote them as CTAS (running "drop table aaa purge;" after each execution so that the next try starts from the same place).

QUESTION: I wrote these as CTAS since I want the SELECT to run, but I don't want anything output to the screen (I don't care to time I/O operations of this nature). I imagine I could just run the SELECT query and turn off screen output, but not sure if I can then see the execution time. What's the proper way to test in such a situation? Creating and populating the new table obviously adds overhead, nowhere near as much as outputting to the screen; still, this means my experiment will understate the performance gain from optimization.

Slower version

create table aaa as
select to_char(dt, 'yyyy_mm_dd hh24:mi:ss') as mi, count(*) as ct
from   ttt
group by to_char(dt, 'yyyy_mm_dd hh24:mi:ss');

Faster version

create table aaa as
select to_char(dt, 'yyyy_mm_dd hh24:mi:ss') as mi, count(*) as ct
from   ttt
group by dt;

Explain plan (same for both)

-------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT |      |  7201 | 57608 |  1043   (3)| 00:00:13 |
|   1 |  LOAD AS SELECT        | AAA  |       |       |            |          |
|   2 |   HASH GROUP BY        |      |  7201 | 57608 |   706   (5)| 00:00:09 |
|   3 |    TABLE ACCESS FULL   | TTT  |  1000K|  7812K|   680   (1)| 00:00:09 |
-------------------------------------------------------------------------------

Even though the execution plan is the same, the run time is not. For the slow version it is consistently around 0.75 seconds; for the fast version, consistently around 0.25 seconds. This suggests that Oracle computes to_char(dt) twice: once for GROUP BY and then again for SELECT. (It is hard to believe it caches the result, and the difference in execution time is caused simply by the overhead of caching and retrieving from the cache). I imagine, also, that grouping by a date field may be faster than grouping by the resulting character string.

QUESTION: How can one tell how much of the difference is due to calling to_char twice, and how much (if anything) is due to differences in the efficiency of grouping by a date field vs. a character expression? Is that possible?

Before this gets too long: I also looked at grouping by minutes. In that case, in SELECT the date format model is 'yyyy-mm-dd hh24:mi' (say), and using trunc(dt, 'mi') in GROUP BY will not work. What does work, though, is to use to_date to_char(trunc(dt, 'mi'), 'yyyy-mm-dd hh24:mi') in SELECT and trunc(dt, 'mi') in GROUP BY. This "faster version" is twice as fast as the "slow version" (which uses to_char(dt, ... hh24:mi') both in SELECT and in GROUP BY) - even with the overhead of evaluating trunc(dt, 'mi') twice. (Sorry for the above typo - thank you Swen for catching it!)

I haven't tested the use of an index on dt yet, which (in the first example) will likely make the performance gap between the slower and the faster versions even greater. And I am just learning to read and use explain plans, I haven't advanced to execution plans yet (although I am sure you'll tell me I need to get there... I know, I know!)