Skip to Main Content

Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Corrupt SysStat counter values for PDB's

Christian GerdesAug 22 2018 — edited Aug 22 2018

I've created a simple monitoring tool that we use during performance testing. The tool queries the v$syststat tables, and the v$con_sysstat table on the CDB in order to get all stats for the PDB's. Since almost all counters of interrest are cumulative, the tool calculates the delta between 2 measurements and translates this into an average per second.

Now, the interresting thing is that under load (the PDB gets hit by work) counters start misbehaving. Here is an example:

2018-05-09 16:06:52.448928: non-idle wait time:19879160

2018-05-09 16:07:08.584731: non-idle wait time:19879879

2018-05-09 16:07:24.702847: non-idle wait time:19881424

2018-05-09 16:07:40.787016: non-idle wait time:19909612 <--- to high (spike)

2018-05-09 16:07:56.998431: non-idle wait time:19884024 <--- normal again (negative delta)

2018-05-09 16:08:13.011631: non-idle wait time:19884440

2018-05-09 16:08:29.070905: non-idle wait time:19885139

2018-05-09 16:08:45.077490: non-idle wait time:19886098

2018-05-09 16:09:01.213923: non-idle wait time:19881003 <--- to low (negative delta)

2018-05-09 16:09:17.375727: non-idle wait time:19896757 <--- to high (spike)

2018-05-09 16:09:33.387790: non-idle wait time:19888356 <--- to low (negative delta)

2018-05-09 16:09:50.343326: non-idle wait time:19889402

2018-05-09 16:10:06.539282: non-idle wait time:19890466

Above are the raw unmodified values from the table together with a timestamp for the query. As you can see, some measurements are corrupt or incorrect, causing the delta (diffrence compared to the previous measurement) either to spike or to become negative. This happens more frequently as the load increases. It happens not just for the above measurement but for many of the counters (IO, CPU, etc).

The problem ONLY affects PDB sysstat values, not the CDB sysstat values.

I've compared the measurements above to the CDB values in v$sysstat, and if I add them together (all the PDB values) they still make sense. However, when a corrupt measurements occurs, it seems one of the other PDB's is also affected, but in reverse (if one increases, the other PDBs value decreases by the same amount). It seems to me that somethimes Oracles internal logic incorrectly calculates the values between the PDB's on that system.

This is really bad, since I can not trust the measurements. Its also very difficult to identify the problem in code and try to ignore the measurements or correct them.

Anyone else noted this? Is it a bug?

Version is Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Sep 19 2018
Added on Aug 22 2018
1 comment
362 views