We have a busy 4-node RAC data warehouse (Oracle 11.2.0.4 on Linux), with hourly AWR snapshots and 32-day retention.
I discovered there are hourly gaps in DBA_HIST_ACTIVE_SESS_HISTORY on one of the four nodes. In each hour, there is no sampled activity for the first 30-40 minutes of each hour. The other three nodes have complete sample history for the full hour. For example, here is the history for a 6 hour period yesterday.
SELECT snap_id
, instance_number
, MIN(sample_time) first_sample
, MAX(sample_time) last_sample
, COUNT(DISTINCT TRUNC(sample_time,'mi')) AS nbr_mins
, COUNT(*) AS nbr_samples
FROM dba_hist_active_sess_history
WHERE snap_id BETWEEN 43766 AND 43771
GROUP BY snap_id
, instance_number
ORDER BY 1,2;
SNAP_ID | INSTANCE_NUMBER | FIRST_SAMPLE | LAST_SAMPLE | NBR_MINS | NBR_SAMPLES |
43766 | 1 | 2016-05-26 06:00:54 | 2016-05-26 06:59:51 | 60 | 7240 |
43766 | 2 | 2016-05-26 06:40:53 | 2016-05-26 06:59:49 | 20 | 6635 |
43766 | 3 | 2016-05-26 06:00:09 | 2016-05-26 06:59:49 | 60 | 10101 |
43766 | 4 | 2016-05-26 06:00:18 | 2016-05-26 06:59:51 | 60 | 7373 |
43767 | 1 | 2016-05-26 07:00:01 | 2016-05-26 07:59:51 | 60 | 3956 |
43767 | 2 | 2016-05-26 07:39:51 | 2016-05-26 07:59:46 | 21 | 6735 |
43767 | 3 | 2016-05-26 07:00:00 | 2016-05-26 07:59:46 | 60 | 6540 |
43767 | 4 | 2016-05-26 07:00:01 | 2016-05-26 07:59:43 | 60 | 4979 |
43768 | 1 | 2016-05-26 08:00:01 | 2016-05-26 09:00:03 | 61 | 13251 |
43768 | 2 | 2016-05-26 08:42:37 | 2016-05-26 09:00:02 | 19 | 6707 |
43768 | 3 | 2016-05-26 07:59:56 | 2016-05-26 09:00:03 | 62 | 7436 |
43768 | 4 | 2016-05-26 07:59:53 | 2016-05-26 08:59:56 | 61 | 4882 |
43769 | 1 | 2016-05-26 09:00:13 | 2016-05-26 09:59:53 | 60 | 3437 |
43769 | 2 | 2016-05-26 09:46:25 | 2016-05-26 09:59:48 | 14 | 6992 |
43769 | 3 | 2016-05-26 09:00:13 | 2016-05-26 09:59:49 | 60 | 7418 |
43769 | 4 | 2016-05-26 09:00:06 | 2016-05-26 09:59:48 | 60 | 3207 |
43770 | 1 | 2016-05-26 10:00:03 | 2016-05-26 10:59:53 | 60 | 2843 |
43770 | 2 | 2016-05-26 10:45:40 | 2016-05-26 10:59:54 | 15 | 6624 |
43770 | 3 | 2016-05-26 10:00:00 | 2016-05-26 10:59:56 | 60 | 7218 |
43770 | 4 | 2016-05-26 09:59:58 | 2016-05-26 10:59:49 | 61 | 4059 |
43771 | 1 | 2016-05-26 11:00:03 | 2016-05-26 12:00:05 | 61 | 4447 |
43771 | 2 | 2016-05-26 11:43:23 | 2016-05-26 12:00:08 | 18 | 6678 |
43771 | 3 | 2016-05-26 11:00:06 | 2016-05-26 12:00:06 | 61 | 9261 |
43771 | 4 | 2016-05-26 10:59:59 | 2016-05-26 12:00:03 | 62 | 5052 |
Also, checking the contents of V$ACTIVE_SESSION_HISTORY, it seems there is is only around 20 minutes data in the buffer for this node.
SELECT inst_id
, sysdate
, oldest_sample_time
, sample_count
, sampled_bytes
, awr_flush_emergency_count
FROM gv$ash_info
ORDER BY 1;
INST_ID | SYSDATE | OLDEST_SAMPLE_TIME | SAMPLE_COUNT | SAMPLED_BYTES | AWR_FLUSH_EMERGENCY_COUNT |
1 | 2016-05-27 10:07:14 | 2016-05-27 08:38:06 | 599699 | 1425601536 | 44 |
2 | 2016-05-27 10:07:14 | 2016-05-27 09:49:22 | 4118496 | 2088665320 | 1004 |
3 | 2016-05-27 10:07:14 | 2016-05-27 09:13:16 | 4121205 | 1148471496 | 336 |
4 | 2016-05-27 10:07:14 | 2016-05-27 08:30:39 | 4015558 | 2953448344 | 459 |
The AWR_FLUSH_EMERGENCY_COUNT is not incrementing for Node 2, and there is no mention of emergency flushes in the alert log. Having said that, I would expect that, even if emergency flushes were occuring, the data would be persisted to DBA_HIST_ACTIVE_SESS_HISTORY.
Based on the above, I suspect that V$ACTIVE_SESSION_HISTORY is unable to hold the entire hour's data, but rather than performing an emergency flush, the data is being lost.
Any suggestions what might be causing this, and what can be done to prevent loss of this data?