Skip to Main Content

Oracle Database Discussions

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Increased storage CPU/IO

User_OCZ1TSep 1 2020 — edited Sep 8 2020

Hi, We are using version 11.2.0.4 of Oracle Exadata. Its 4-node RAC , and each node is having 72 CPU 32 core 2 socket configuration, Linux x86 64-bit. We suddenly found slow down across multiple queries in our system, then after looking into OEM it was showing the queries were not changing execution path and even they were not reading from UNDO, but what we see AWR/ASH there does a increase in I/O(Cell single block physical read) significantly for those queries even they were reading similar volume of data.

While we were struggling to find out the cause behind sudden increase in I/O pattern in our database. OEM top activity section were all showing many of the such queries on top. We see the underlying cell nodes , we saw the storage I/O response has been increased across multiple cell nodes, from ~2ms/request to 4-6ms/request after specific time(around two days back) and also CPU on the cell nodes has been increased and constantly keeping ~80%+ since then. Number of I/O per second has been increased from ~2k/seconds to ~7k/seconds.

Then digging into that time window we found below runaway query, which was running for almost 60hrs, but still was not showing as a top DB CPU/IO consumer in OEM or AWR view. So we killed that just to see if its really helping us, and to our surprise , the IO and CPU across all the ~7 storage cells dropped back to normal.

Below is the sql monitor for the query which we killed, and I am unable to understand , how the below query(which was running in a single thread) can cause a spike in overall IO/CPU across all the 7 cell nodes of our database, and also its only one database running on this box? DBAs saying it was accumulating lot of read consistent data and caused this, but if I check sql monitor even ASH , I don't see that, as because read consistent or UNDO read will cause "cell single block physical read" but in the below sql monitor I don't see that.  Not seeing anything odd in the query outline section too. Sql monitor showing to be read 9PB of data but the cell offload also showing 99.9%, so hopefully that is fine. It shows the table has been full scanned ~124k times using cell smart scan. I think this number of cell smart scan should not cause such drastic impact across cell nodes. Correct me if wrong.In the activity detail sectio i do see ~105 seconds of reliable message and ~8 seconds of gcs drm freeze in enter server mode, Not sure what exact those pointing to? Here TAB1 is a daily range partitioned table on column PART_DT.

Want to understand how below sql can cause such a drastic impact across all the cell storage performance?

SELECT /*+full(TAB1)*/ ......
FROM TAB2, TAB1
WHERE TAB2.E_TYP = 'XX'  AND TAB2.CT IN ('XX', 'XX')   AND TAB1.COL_2 <> 0   AND TAB2.RID = XXXXXXX
AND TAB1.DID = TAB2.C_KEY
AND TAB1.SB_ID = TAB2.SB_ID
AND TAB1.SID = TAB2.object_id
AND TAB1.PART_DT = TAB2.COL_DT

Global Information
------------------------------
Status              :  DONE (FIRST N ROWS)             
Instance ID         :  2                               
SQL Execution ID    :  33554432                        
Execution Started   :  08/29/2020 07:29:38             
First Refresh Time  :  08/29/2020 07:29:43             
Last Refresh Time   :  08/31/2020 14:31:37             
Duration            :  198119s                         
Fetch Calls         :  34196                           

Global Stats
=================================================================================================================
| Elapsed |   Cpu   |    IO    | Application | Concurrency | Cluster  | Fetch | Buffer | Read | Read  |  Cell   |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   |  Waits(s)   | Waits(s) | Calls |  Gets  | Reqs | Bytes | Offload |
=================================================================================================================
|  221155 |   93910 |   127124 |         120 |        0.90 |     0.26 | 34196 |     1T |  10G |   9PB |  99.90% |
=================================================================================================================

SQL Plan Monitoring Details (Plan Hash Value=757850510)
=======================================================================================================================================================================================================================
| Id |            Operation            |             Name             |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read  | Read  |  Cell   |  Mem  | Activity |             Activity Detail             |
|    |                                 |                              | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs  | Bytes | Offload | (Max) |   (%)    |               (# samples)               |
=======================================================================================================================================================================================================================
|  0 | SELECT STATEMENT                |                              |         |      |    198115 |     +5 |     1 |       6M |       |       |         |       |          |                                         |
|  1 |   NESTED LOOPS                  |                              |     159 |  86M |    198115 |     +5 |     1 |       6M |       |       |         |       |          |                                         |
|  2 |    INLIST ITERATOR              |                              |         |      |    198115 |     +5 |     1 |        1 |       |       |         |       |          |                                         |
|  3 |     TABLE ACCESS BY INDEX ROWID | TAB2                         |     159 |   41 |    198115 |     +5 |     2 |     124K | 19555 | 153MB |         |       |     0.01 | Cpu (1)                                 |
|    |                                 |                              |         |      |           |        |       |          |       |       |         |       |          | cell single block physical read (1)     |
|  4 |      INDEX RANGE SCAN           | TAB2_IX1                     |     754 |    7 |    198115 |     +5 |     2 |     124K |   486 |   4MB |         |       |          |                                         |
|  5 |    PARTITION RANGE ITERATOR     |                              |       1 | 540K |    198115 |     +5 |  124K |       6M |       |       |         |       |          |                                         |
|  6 |     TABLE ACCESS STORAGE FULL   | TAB1                         |       1 | 540K |    198115 |     +5 |  124K |       6M |   10G |   1TB |  -8.70% |    7M |    99.99 | enq: KO - fast object checkpoint (14)   |
|    |                                 |                              |         |      |           |        |       |          |       |       |         |       |          | Cpu (11124)                             |
|    |                                 |                              |         |      |           |        |       |          |       |       |         |       |          | gcs drm freeze in enter server mode (8) |
|    |                                 |                              |         |      |           |        |       |          |       |       |         |       |          | reliable message (105)                  |
|    |                                 |                              |         |      |           |        |       |          |       |       |         |       |          | cell smart table scan (21230)           |
=======================================================================================================================================================================================================================


  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
      DB_VERSION('11.2.0.4')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      INDEX_RS_ASC(@"SEL$1" "TAB2"@"SEL$1" ("TAB2"."RID"  "TAB2"."CT"))
      NUM_INDEX_KEYS(@"SEL$1" "TAB2"@"SEL$1" "TAB2_IX1" 2)
      FULL(@"SEL$1" "TAB1"@"SEL$1")
      LEADING(@"SEL$1" "TAB2"@"SEL$1" "TAB1"@"SEL$1")
      USE_NL(@"SEL$1" "TAB1"@"SEL$1")
      END_OUTLINE_DATA
  */

This post has been answered by Jonathan Lewis on Sep 6 2020
Jump to Answer
Comments
Post Details
Added on Sep 1 2020
23 comments
2,011 views