Skip to Main Content

Infrastructure Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

Oracle VM Server 3.3, high IO load sda, all repositories at SAN

Zhuchenko ValeryMay 8 2019 — edited Jun 3 2019

Hi OVM-Gurus,

I have two Oracle VM servers (DL360gen9, RAM=64G, for Dom0 Memory 1856 MB https://docs.oracle.com/cd/E50245_01/E50251/html/vmadm-config-server-memory.html , two Xeon E5-2620 v3) in clustered pool.

All OVM repositories also at SAN storages (iSCSI), no local repositories. All operations I do only via OVMM web or CLI.

At OVMM Health tab I see multiple errors for only one server:  "server.disconnected. Server Connection Failure" at May 08, 2019 1:25:58 am, May 08, 2019 1:27:24 am, May 08, 2019 1:29:26 am, May 08, 2019 1:30:43 am.

And at Events tab May 08, 2019 1:31:24 am Success AutoDiscoverTask: Discover server ... refreshType: ALL @OvmAdmin 2m:4s May 08, 2019 1:33:29 am

I see high IO load for this server at sda, where OVS installed and where swap. dm-6 too, it is one of OVM repositories (ls -l /dev/disk/by-id|grep dm-6 .. mount|grep ...).

What can cause this high load at sda? What to do at OVS?

Help me, please.

# free

             total       used       free     shared    buffers     cached

Mem:       2036612    1494012     542600          0     142368     733088

-/+ buffers/cache:     618556    1418056

Swap:     32301052     166056   32134996

# sar

00:00:02        CPU     %user     %nice   %system   %iowait    %steal     %idle

...

01:10:01        all      1,11      0,00      2,04      1,30      0,26     95,28

01:20:01        all      1,11      0,00      2,01      1,21      0,25     95,42

01:30:41        all      0,58      0,00      2,56     18,83      0,40     77,63

01:40:01        all      0,95      0,00      2,32      8,02      0,29     88,42

01:50:01        all      1,12      0,00      2,15      1,49      0,30     94,93

...

# sar -d -p|grep -E "^01:2|^01:3|^00.*DEV"

00:00:02          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util

01:20:01          sda     72,46   1682,71   2078,31     51,90      0,44      6,03      1,49     10,79

01:20:01          sdb      0,00      0,00      0,00      0,00      0,00      0,00      0,00      0,00

01:20:01          sdc      0,06      0,46      0,00      8,00      0,00      0,40      0,40      0,00

01:20:01          sdd      0,06      0,45      0,00      8,00      0,00      0,69      0,69      0,00

....

01:20:01         dm-6    291,40   5439,77   3980,64     32,33      0,87      2,99      2,25     65,57

01:20:01         dm-7      1,15      1,15      0,58      1,50      0,01      4,34      4,35      0,50

01:20:01         dm-8      0,00      0,00      0,00      0,00      0,00      0,00      0,00      0,00

01:20:01         dm-9      0,00      0,00      0,00      0,00      0,00      0,00      0,00      0,00

01:20:01        dm-10      0,00      0,00      0,00      0,00      0,00      0,00      0,00      0,00

01:30:41          sda    991,97  12965,81   5835,16     18,95     11,59     11,68      0,89     88,19

01:30:41          sdb      0,00      0,00      0,00      0,00      0,00      0,00      0,00      0,00

01:30:41          sdc      0,06      0,48      0,00      8,00      0,00      0,58      0,58      0,00

01:30:41          sdd      0,06      0,49      0,00      8,00      0,00      0,47      0,47      0,00

...

01:30:41         dm-6    203,77   1384,78   4131,66     27,07      0,28      1,35      1,08     22,03

01:30:41         dm-7      1,23      1,23      0,61      1,50      0,00      3,96      3,96      0,49

01:30:41         dm-8      0,00      0,00      0,00      0,00      0,00      0,00      0,00      0,00

01:30:41         dm-9      0,00      0,00      0,00      0,00      0,00      0,00      0,00      0,00

01:30:41        dm-10      0,00      0,00      0,00      0,00      0,00      0,00      0,00      0,00

ovs-agent work normally (I don't restart it), statistics displayed normally at Health tab.

# /etc/init.d/ovs-agent status

log server (pid 4541) is running...

notificationserver server (pid 4913) is running...

remaster server (pid 4922) is running...

monitor server (pid 4924) is running...

ha server (pid 4926) is running...

stats server (pid 4928) is running...

xmlrpc server (pid 4929) is running...

/var/log/ovs-agent.log

[2019-05-07 21:30:48 1890] DEBUG (base:237) async call complete: func: storage_plugin_destroy pid: 1890 status: 0 output:

[2019-05-07 21:30:48 4913] DEBUG (notificationserver:192) sent events: [('{ASYNC_PROC} exit PID 1890', {'status': 0, 'output': ''})]

[2019-05-08 01:27:22 4926] ERROR (ha:51) Failed to get VM list on ...: The read operation timed out

[2019-05-08 01:27:30 4926] DEBUG (ha:97) VM 0004fb00000600005b49de75097834c5 already started

[2019-05-08 01:27:30 4926] DEBUG (ha:97) VM 0004fb0000060000cc3cfb9eb4e19bf4 already started

[2019-05-08 01:27:30 4926] DEBUG (ha:97) VM 0004fb0000060000082f8c4f51857654 already started

[2019-05-08 01:27:30 4926] DEBUG (ha:97) VM 0004fb000006000054babad6bf1bff7d already started

[2019-05-08 01:27:30 4926] DEBUG (ha:97) VM 0004fb000006000032feb2db42dd3d31 already started

[2019-05-08 01:27:30 4926] DEBUG (ha:97) VM 0004fb000006000047b1268f6ddcc875 already started

[2019-05-08 01:27:30 4926] DEBUG (ha:97) VM 0004fb000006000046552fb11d14b41c already started

[2019-05-08 01:27:30 4926] DEBUG (ha:97) VM 0004fb0000060000e3387c2815e717ec already started

[2019-05-08 01:28:12 4928] DEBUG (linuxstats:45) Error getting VM stats: Lock file /var/run/ovs-agent/vm-list.lock failed: timeout occured.

[2019-05-08 01:30:36 4928] DEBUG (linuxstats:45) Error getting VM stats: Lock file /var/run/ovs-agent/vm-list.lock failed: timeout occured.

[2019-05-08 01:31:28 13620] DEBUG (service:77) call start: get_api_version

[2019-05-08 01:31:29 13620] DEBUG (service:77) call complete: get_api_version

[2019-05-08 01:31:29 13622] DEBUG (service:77) call start: discover_server

[2019-05-08 01:31:54 13622] DEBUG (service:77) call complete: discover_server

[2019-05-08 01:31:55 13644] DEBUG (service:77) call start: discover_hardware

[2019-05-08 01:31:58 13644] DEBUG (service:77) call complete: discover_hardware

[2019-05-08 01:32:34 13701] DEBUG (service:77) call start: discover_network

[2019-05-08 01:32:50 4926] ERROR (ha:51) Failed to get VM list on ...: The read operation timed out

[2019-05-08 01:32:53 4926] DEBUG (ha:97) VM 0004fb00000600005b49de75097834c5 already started

[2019-05-08 01:32:53 4926] DEBUG (ha:97) VM 0004fb0000060000cc3cfb9eb4e19bf4 already started

[2019-05-08 01:32:53 4926] DEBUG (ha:97) VM 0004fb0000060000082f8c4f51857654 already started

[2019-05-08 01:32:53 4926] DEBUG (ha:97) VM 0004fb000006000054babad6bf1bff7d already started

[2019-05-08 01:32:53 4926] DEBUG (ha:97) VM 0004fb000006000032feb2db42dd3d31 already started

[2019-05-08 01:32:53 4926] DEBUG (ha:97) VM 0004fb000006000047b1268f6ddcc875 already started

[2019-05-08 01:32:53 4926] DEBUG (ha:97) VM 0004fb000006000046552fb11d14b41c already started

[2019-05-08 01:32:53 4926] DEBUG (ha:97) VM 0004fb0000060000e3387c2815e717ec already started

[2019-05-08 01:33:01 13532] ERROR (service:97) catch_error: Lock file /var/run/ovs-agent/vm-list.lock failed: timeout occured.

Traceback (most recent call last):

  File "/usr/lib64/python2.6/site-packages/agent/lib/service.py", line 95, in wrapper

    return func(*args)

  File "/usr/lib64/python2.6/site-packages/agent/api/hypervisor/xenxm.py", line 268, in list_vms

    return get_vms()

  File "/usr/lib64/python2.6/site-packages/agent/lib/xenxm.py", line 121, in get_vms

    lock.acquire(wait=120)

  File "/usr/lib64/python2.6/site-packages/agent/lib/filelock.py", line 96, in acquire

    self.filename)

LockError: Lock file /var/run/ovs-agent/vm-list.lock failed: timeout occured.

[2019-05-08 01:33:11 13701] DEBUG (service:77) call complete: discover_network

[2019-05-08 01:33:11 13809] DEBUG (service:77) call start: discover_storage_plugins

[2019-05-08 01:33:11 13809] DEBUG (service:77) call complete: discover_storage_plugins

[2019-05-08 01:33:11 13812] DEBUG (service:75) call start: discover_physical_luns('',)

[2019-05-08 01:33:13 13812] DEBUG (service:77) call complete: discover_physical_luns

[2019-05-08 01:33:14 13890] DEBUG (service:77) call start: discover_repository_db

[2019-05-08 01:33:15 13890] DEBUG (service:77) call complete: discover_repository_db

...

[2019-05-08 01:33:28 13931] DEBUG (service:77) call complete: package_get_repositories

[2019-05-08 01:33:29 13932] DEBUG (service:77) call start: discover_cluster

[2019-05-08 01:33:29 13932] DEBUG (service:77) call complete: discover_cluster

[2019-05-08 07:30:38 15895] DEBUG (service:75) async call start

/var/log/messages no records for problematic interval

May  7 19:00:28 ...

May  8 07:17:33 ...

Comments

Post Details

Added on May 8 2019
1 comment
292 views