Hi OVM-Gurus,
I have two Oracle VM servers (DL360gen9, RAM=64G, for Dom0 Memory 1856 MB https://docs.oracle.com/cd/E50245_01/E50251/html/vmadm-config-server-memory.html , two Xeon E5-2620 v3) in clustered pool.
All OVM repositories also at SAN storages (iSCSI), no local repositories. All operations I do only via OVMM web or CLI.
At OVMM Health tab I see multiple errors for only one server: "server.disconnected. Server Connection Failure" at May 08, 2019 1:25:58 am, May 08, 2019 1:27:24 am, May 08, 2019 1:29:26 am, May 08, 2019 1:30:43 am.
And at Events tab May 08, 2019 1:31:24 am Success AutoDiscoverTask: Discover server ... refreshType: ALL @OvmAdmin 2m:4s May 08, 2019 1:33:29 am
I see high IO load for this server at sda, where OVS installed and where swap. dm-6 too, it is one of OVM repositories (ls -l /dev/disk/by-id|grep dm-6 .. mount|grep ...).
What can cause this high load at sda? What to do at OVS?
Help me, please.
# free
total used free shared buffers cached
Mem: 2036612 1494012 542600 0 142368 733088
-/+ buffers/cache: 618556 1418056
Swap: 32301052 166056 32134996
# sar
00:00:02 CPU %user %nice %system %iowait %steal %idle
...
01:10:01 all 1,11 0,00 2,04 1,30 0,26 95,28
01:20:01 all 1,11 0,00 2,01 1,21 0,25 95,42
01:30:41 all 0,58 0,00 2,56 18,83 0,40 77,63
01:40:01 all 0,95 0,00 2,32 8,02 0,29 88,42
01:50:01 all 1,12 0,00 2,15 1,49 0,30 94,93
...
# sar -d -p|grep -E "^01:2|^01:3|^00.*DEV"
00:00:02 DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
01:20:01 sda 72,46 1682,71 2078,31 51,90 0,44 6,03 1,49 10,79
01:20:01 sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
01:20:01 sdc 0,06 0,46 0,00 8,00 0,00 0,40 0,40 0,00
01:20:01 sdd 0,06 0,45 0,00 8,00 0,00 0,69 0,69 0,00
....
01:20:01 dm-6 291,40 5439,77 3980,64 32,33 0,87 2,99 2,25 65,57
01:20:01 dm-7 1,15 1,15 0,58 1,50 0,01 4,34 4,35 0,50
01:20:01 dm-8 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
01:20:01 dm-9 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
01:20:01 dm-10 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
01:30:41 sda 991,97 12965,81 5835,16 18,95 11,59 11,68 0,89 88,19
01:30:41 sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
01:30:41 sdc 0,06 0,48 0,00 8,00 0,00 0,58 0,58 0,00
01:30:41 sdd 0,06 0,49 0,00 8,00 0,00 0,47 0,47 0,00
...
01:30:41 dm-6 203,77 1384,78 4131,66 27,07 0,28 1,35 1,08 22,03
01:30:41 dm-7 1,23 1,23 0,61 1,50 0,00 3,96 3,96 0,49
01:30:41 dm-8 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
01:30:41 dm-9 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
01:30:41 dm-10 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
ovs-agent work normally (I don't restart it), statistics displayed normally at Health tab.
# /etc/init.d/ovs-agent status
log server (pid 4541) is running...
notificationserver server (pid 4913) is running...
remaster server (pid 4922) is running...
monitor server (pid 4924) is running...
ha server (pid 4926) is running...
stats server (pid 4928) is running...
xmlrpc server (pid 4929) is running...
/var/log/ovs-agent.log
[2019-05-07 21:30:48 1890] DEBUG (base:237) async call complete: func: storage_plugin_destroy pid: 1890 status: 0 output:
[2019-05-07 21:30:48 4913] DEBUG (notificationserver:192) sent events: [('{ASYNC_PROC} exit PID 1890', {'status': 0, 'output': ''})]
[2019-05-08 01:27:22 4926] ERROR (ha:51) Failed to get VM list on ...: The read operation timed out
[2019-05-08 01:27:30 4926] DEBUG (ha:97) VM 0004fb00000600005b49de75097834c5 already started
[2019-05-08 01:27:30 4926] DEBUG (ha:97) VM 0004fb0000060000cc3cfb9eb4e19bf4 already started
[2019-05-08 01:27:30 4926] DEBUG (ha:97) VM 0004fb0000060000082f8c4f51857654 already started
[2019-05-08 01:27:30 4926] DEBUG (ha:97) VM 0004fb000006000054babad6bf1bff7d already started
[2019-05-08 01:27:30 4926] DEBUG (ha:97) VM 0004fb000006000032feb2db42dd3d31 already started
[2019-05-08 01:27:30 4926] DEBUG (ha:97) VM 0004fb000006000047b1268f6ddcc875 already started
[2019-05-08 01:27:30 4926] DEBUG (ha:97) VM 0004fb000006000046552fb11d14b41c already started
[2019-05-08 01:27:30 4926] DEBUG (ha:97) VM 0004fb0000060000e3387c2815e717ec already started
[2019-05-08 01:28:12 4928] DEBUG (linuxstats:45) Error getting VM stats: Lock file /var/run/ovs-agent/vm-list.lock failed: timeout occured.
[2019-05-08 01:30:36 4928] DEBUG (linuxstats:45) Error getting VM stats: Lock file /var/run/ovs-agent/vm-list.lock failed: timeout occured.
[2019-05-08 01:31:28 13620] DEBUG (service:77) call start: get_api_version
[2019-05-08 01:31:29 13620] DEBUG (service:77) call complete: get_api_version
[2019-05-08 01:31:29 13622] DEBUG (service:77) call start: discover_server
[2019-05-08 01:31:54 13622] DEBUG (service:77) call complete: discover_server
[2019-05-08 01:31:55 13644] DEBUG (service:77) call start: discover_hardware
[2019-05-08 01:31:58 13644] DEBUG (service:77) call complete: discover_hardware
[2019-05-08 01:32:34 13701] DEBUG (service:77) call start: discover_network
[2019-05-08 01:32:50 4926] ERROR (ha:51) Failed to get VM list on ...: The read operation timed out
[2019-05-08 01:32:53 4926] DEBUG (ha:97) VM 0004fb00000600005b49de75097834c5 already started
[2019-05-08 01:32:53 4926] DEBUG (ha:97) VM 0004fb0000060000cc3cfb9eb4e19bf4 already started
[2019-05-08 01:32:53 4926] DEBUG (ha:97) VM 0004fb0000060000082f8c4f51857654 already started
[2019-05-08 01:32:53 4926] DEBUG (ha:97) VM 0004fb000006000054babad6bf1bff7d already started
[2019-05-08 01:32:53 4926] DEBUG (ha:97) VM 0004fb000006000032feb2db42dd3d31 already started
[2019-05-08 01:32:53 4926] DEBUG (ha:97) VM 0004fb000006000047b1268f6ddcc875 already started
[2019-05-08 01:32:53 4926] DEBUG (ha:97) VM 0004fb000006000046552fb11d14b41c already started
[2019-05-08 01:32:53 4926] DEBUG (ha:97) VM 0004fb0000060000e3387c2815e717ec already started
[2019-05-08 01:33:01 13532] ERROR (service:97) catch_error: Lock file /var/run/ovs-agent/vm-list.lock failed: timeout occured.
Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/agent/lib/service.py", line 95, in wrapper
return func(*args)
File "/usr/lib64/python2.6/site-packages/agent/api/hypervisor/xenxm.py", line 268, in list_vms
return get_vms()
File "/usr/lib64/python2.6/site-packages/agent/lib/xenxm.py", line 121, in get_vms
lock.acquire(wait=120)
File "/usr/lib64/python2.6/site-packages/agent/lib/filelock.py", line 96, in acquire
self.filename)
LockError: Lock file /var/run/ovs-agent/vm-list.lock failed: timeout occured.
[2019-05-08 01:33:11 13701] DEBUG (service:77) call complete: discover_network
[2019-05-08 01:33:11 13809] DEBUG (service:77) call start: discover_storage_plugins
[2019-05-08 01:33:11 13809] DEBUG (service:77) call complete: discover_storage_plugins
[2019-05-08 01:33:11 13812] DEBUG (service:75) call start: discover_physical_luns('',)
[2019-05-08 01:33:13 13812] DEBUG (service:77) call complete: discover_physical_luns
[2019-05-08 01:33:14 13890] DEBUG (service:77) call start: discover_repository_db
[2019-05-08 01:33:15 13890] DEBUG (service:77) call complete: discover_repository_db
...
[2019-05-08 01:33:28 13931] DEBUG (service:77) call complete: package_get_repositories
[2019-05-08 01:33:29 13932] DEBUG (service:77) call start: discover_cluster
[2019-05-08 01:33:29 13932] DEBUG (service:77) call complete: discover_cluster
[2019-05-08 07:30:38 15895] DEBUG (service:75) async call start
/var/log/messages no records for problematic interval
May 7 19:00:28 ...
May 8 07:17:33 ...