Hello Everyone,
I would like to begin with mentioning that I am not an expert and need your help.
We have 3 DB instances, one oracle home, running in one server. In terms of size, one database (300gb) is much bigger than the other
2 (100gb and 10gb). We upgraded all the databases from 11.2.0.3(solaris) to 12.1.0.2 (RHEL 7 linux) last year.
The only patch we applied since the upgrade was "July 2018 critical Patch" (Patch 28317232).
We are getting "high load" on the server very frequetly. The load gets upto certain point when all the applications stop working, we can not even ssh to the server.
We have cron jobs set up to take cold back and start time of cold backups are 2:10am(10gb) , 2:30am(100gb) and 2:55am(300gb).
10gb database - DBS
100gb database - DBM
300gb database - DBL
Here is another issue we are facing:
1. database server loads get high for 2 mins or so, while restarting the database after taking cold backup.
2. Sometimes database fails to start. Alert log shows -Instance Critical Process (pid: 7, ospid: 124687, SA00) died unexpectedly -- PMON (ospid: 124669): terminating the instance due to error 12752.
Again, this does not happen everytime we take cold back up but happens once in a while.
Here is one scenario we experinced recently while taking cold back up:
DBS(10gb) went down at 2:10am - copied all the files - got restarted at 2:15am without any issues.
DBM(100gb) went down at 2:30am - copied all the files - could not get restarted because of SA00 error I mentioned above.
DBL(300gb) was still running while taking cold back up of other 2 databases.
I tried to manually start DBM database. DBM got started but DBL got terminated and the error was - Instance Critical Process (pid: 13, ospid: 41528, DBW1) died unexpectedly -- PMON (ospid: 41498): terminating the instance due to error 471.
I stopped all the databases and restarted in below order:
DBS, DBM and DBL.
All the databases came up without any issue and no high load on the server. I have also seen high load while purging audit tables or recompiling DB objects.
database memory info:
$ cat /proc/meminfo
MemTotal: 65687432 kB
MemFree: 294032 kB
MemAvailable: 1452792 kB
Buffers: 0 kB
Cached: 56441492 kB
SwapCached: 4108 kB
Active: 32931252 kB
Inactive: 26755296 kB
Active(anon): 32652116 kB
Inactive(anon): 26474708 kB
Active(file): 279136 kB
Inactive(file): 280588 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 17407996 kB
SwapFree: 17253368 kB
Dirty: 8 kB
Writeback: 0 kB
AnonPages: 3241656 kB
Mapped: 12908360 kB
Shmem: 55881768 kB
Slab: 1853944 kB
SReclaimable: 1096000 kB
SUnreclaim: 757944 kB
KernelStack: 24448 kB
PageTables: 3058688 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 50251712 kB
Committed_AS: 101421236 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 468576 kB
VmallocChunk: 34325379068 kB
HardwareCorrupted: 0 kB
AnonHugePages: 49152 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 195304 kB
DirectMap2M: 6000640 kB
DirectMap1G: 62914560 kB
We have below settings for all the databases:
sga_target=30228m
pga_aggregate_target=10076m
I am not sure what needs to be done to resolve this issue. We never had this issue since DB upgrade until patches were applied.
Thanks in advance!