I've created a simple monitoring tool that we use during performance testing. The tool queries the v$syststat tables, and the v$con_sysstat table on the CDB in order to get all stats for the PDB's. Since almost all counters of interrest are cumulative, the tool calculates the delta between 2 measurements and translates this into an average per second.
Now, the interresting thing is that under load (the PDB gets hit by work) counters start misbehaving. Here is an example:
2018-05-09 16:06:52.448928: non-idle wait time:19879160
2018-05-09 16:07:08.584731: non-idle wait time:19879879
2018-05-09 16:07:24.702847: non-idle wait time:19881424
2018-05-09 16:07:40.787016: non-idle wait time:19909612 <--- to high (spike)
2018-05-09 16:07:56.998431: non-idle wait time:19884024 <--- normal again (negative delta)
2018-05-09 16:08:13.011631: non-idle wait time:19884440
2018-05-09 16:08:29.070905: non-idle wait time:19885139
2018-05-09 16:08:45.077490: non-idle wait time:19886098
2018-05-09 16:09:01.213923: non-idle wait time:19881003 <--- to low (negative delta)
2018-05-09 16:09:17.375727: non-idle wait time:19896757 <--- to high (spike)
2018-05-09 16:09:33.387790: non-idle wait time:19888356 <--- to low (negative delta)
2018-05-09 16:09:50.343326: non-idle wait time:19889402
2018-05-09 16:10:06.539282: non-idle wait time:19890466
Above are the raw unmodified values from the table together with a timestamp for the query. As you can see, some measurements are corrupt or incorrect, causing the delta (diffrence compared to the previous measurement) either to spike or to become negative. This happens more frequently as the load increases. It happens not just for the above measurement but for many of the counters (IO, CPU, etc).
The problem ONLY affects PDB sysstat values, not the CDB sysstat values.
I've compared the measurements above to the CDB values in v$sysstat, and if I add them together (all the PDB values) they still make sense. However, when a corrupt measurements occurs, it seems one of the other PDB's is also affected, but in reverse (if one increases, the other PDBs value decreases by the same amount). It seems to me that somethimes Oracles internal logic incorrectly calculates the values between the PDB's on that system.
This is really bad, since I can not trust the measurements. Its also very difficult to identify the problem in code and try to ignore the measurements or correct them.
Anyone else noted this? Is it a bug?
Version is Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production