We have 10.2.0.5 on AIX. Yes, yes, I know that it is ancient.
Recently it started experiencing waits on cursor: pin S wait on X. The waits last for several hours. In one incident the database stopped responding and had to be restarted.
On average we see about 5 sessions waiting on this event, active session history shows that there could be up to 30 sessions waiting at a time.
We analysed it following MOS note 1298015.1 and came to conclusion that this is scenario#3 Changing blockers and changing mutexes
We are observing:
- Increase in the rate of hard parses: the application uses bind variables and normally the rate of hard parses is less than 1 per second (except short period during stats collection). During periods of waits on cursor: pin S wait on X the rate of hard parses increases to 100 per second.
- Increase in invalidations: this can be seen from V$LIBRARYCACHE and V$SQL.
MOS 1298015.1 says that this happens when “shared pool is under load (or under sized)”. In this database parameter shared_pool_size is set to 4 GB and actual size is also close to 4 GB. There is plenty of free memory in the shared pool, I don’t think it is undersized. Still we tried increasing shared pool, so far the memory that we added isn’t being used as it sits in “unused” sub-pool# 0.
What could cause these invalidations and hard parses?
We know that it isn’t statistics collection or grants as stats collection doesn’t overlap in time with waits on cursor: pin S wait on X.
We checked contents of the shared pool at the sub-pool level using script sgastatx published by Tanel Poder.
It shows that in most sub-pools ‘sql area’ is very small: around 1 MB.
Shared pool components at the time of high waits on cursor pin S wait on X (components of each sub-pool > 10 MB):
SUBPOOL NAME SUM(BYTES) MB
------------------------------ -------------------------- ---------- ----------
shared pool (1): free memory 402720952 384.06
shared pool (1): state objects 161189200 153.72
shared pool (1): db_block_hash_buckets 149317080 142.4
shared pool (1): KQR X PO 99980816 95.35
shared pool (1): kglsim object batch 15305304 14.6
shared pool (1): library cache 12698648 12.11
shared pool (2): free memory 442873864 422.36
shared pool (2): db_block_hash_buckets 149317080 142.4
shared pool (2): state objects 32524080 31.02
shared pool (2): FileOpenBlock 15936504 15.2
shared pool (2): kglsim object batch 12921216 12.32
shared pool (2): library cache 12521520 11.94
shared pool (3): free memory 512195608 488.47
shared pool (3): db_block_hash_buckets 149317080 142.4
shared pool (3): kglsim object batch 18547200 17.69
shared pool (3): state objects 17988760 17.16
shared pool (3): library cache 12314920 11.74
shared pool (4): sql area 257944216 245.99
shared pool (4): db_block_hash_buckets 157705888 150.4
shared pool (4): free memory 109868720 104.78
shared pool (4): PCursor 59916192 57.14
shared pool (4): CCursor 59574680 56.81
shared pool (4): library cache 59095400 56.36
shared pool (4): NETWORK BUFFER 25435912 24.26
shared pool (4): state objects 22235960 21.21
shared pool (4): kglsim object batch 15100512 14.4
shared pool (4): Heap0: KGL 10532648 10.04
shared pool (5): free memory 553059680 527.44
shared pool (5): db_block_hash_buckets 149317592 142.4
shared pool (5): state objects 20695960 19.74
shared pool (5): kglsim object batch 15711024 14.98
shared pool (5): library cache 14284048 13.62
shared pool (5): Checkpoint queue 10487040 10
Size of ‘sql area’:
shared pool (1): sql area 1244496 1.19
shared pool (2): sql area 679040 .65
shared pool (3): sql area 1809856 1.73
shared pool (4): sql area 257944216 245.99
shared pool (5): sql area 2245248 2.14
I see two possibilities:
- 1. “Something” causes invalidations and then ‘sql area’ shrinks as it has nothing to hold. However such invalidations should be distributed more or less uniformly across all sub-pools.
- 2. “Something” causes ‘sql area’ to shrink in most sub-pools, this leads to invalidations. What could cause shrinking of 'sql area'?
I am thinking about disabling sub-pools, any other ideas?
Other theories that we investigated and dismissed:
- The waits are caused by resizing of SGA components: resizing is infrequent, less than once per day
- The waits are caused by growth of KGH: NO ACCESS component – we didn’t find this component
- High version count: we didn’t find any SQL with high version count
- Large pool is configured so Parallel Query uses large pool instead of shared pool