I have a Server Pool with 3 OVSs. All are running VM's, most have their disks on a shared nfs repository. A few have disks on local ocfs2 repositories.
In the process of updating the OVSs to the latest version, I live migrated all VM's from one OVS spreading them on the two other OVSs. (including the VM's with local storage, which performed live storage migration)
Then I updated the now empty OVS from v3.4.2 to v3.4.5 and migrated back all VM's that where originaly on this OVS.
I repeated this procedure for all 3 OVSs..
So now all 3 OVSs are running the latest version v3.4.5-1919
But then I wanted to migrate back all VM's that originally ran on the last updated OVS back to that OVS; This works correctly for the VM's with all disks on the shared repository but when trying to migrate the VM's with disks on local storage (so using live storage migration) the job is started, but the storage is not being migrated and the jobs hangs indefinitely.
Even more strange when I now try to live storage migrate a VM between any of those 3 OVSs (so not only to the latest updated one); this all results in a hanging migration job, not actually performing any migration.
On the source OVS I see this in the ovs-agent.log:
[2018-08-29 09:59:37 29757] DEBUG (service:75) async call start: migrate_vm_with_storage('0004fb00000300007ee2b8ee807b26fe', '0004fb000006000046af0294f924ae23', '143.169.232.28', [{'src_file_path': '/OVS/Repositories/0004fb00000300007ee2b8ee807b26fe/VirtualDisks/0004fb000012000073e3a68455dcb5ea.img', 'dst_file_path': '/OVS/Repositories/0004fb00000300000fc1308166ad909e/VirtualDisks/0004fb000012000073e3a68455dcb5ea.img'}], '/OVS/Repositories/0004fb00000300000fc1308166ad909e/VirtualMachines/0004fb000006000046af0294f924ae23/vm.cfg', True, False)
[2018-08-29 09:59:37 29758] DEBUG (storage_vm:39) Storage migration begin domain 17
[2018-08-29 09:59:37 29758] DEBUG (storage_vm:121) Migrating files domid 17.
and on the destination OVS I see this:
[2018-08-29 09:59:40 11457] DEBUG (service:75) call start: storage_migration_cfgfile_setup('0004fb00000300000fc1308166ad909e', '0004fb000006000046af0294f924ae23')
[2018-08-29 09:59:40 11457] DEBUG (service:77) call complete: storage_migration_cfgfile_setup
[2018-08-29 09:59:40 11458] DEBUG (service:75) call start: create_vm('0004fb00000300000fc1308166ad909e', '0004fb000006000046af0294f924ae23', {'vif': ['mac=00:21:f6:01:eb:6e,bridge=101542c0f8', 'mac=00:21:f6:5e:22:39,bridge=1085a823dc'], 'OVM_simple_name': '*****', 'vnclisten': '127.0.0.1', 'serial': 'pty', 'disk': ['file:/OVS/Repositories/0004fb00000300000fc1308166ad909e/VirtualDisks/0004fb000012000073e3a68455dcb5ea.img,xvda,w'], 'vncunused': '1', 'uuid': '0004fb00-0006-0000-46af-0294f924ae23', 'on_reboot': 'restart', 'boot': 'dc', 'cpu_weight': 33000, 'memory': 16384, 'cpu_cap': 0, 'maxvcpus': 16, 'OVM_high_availability': False, 'vnc': '1', 'OVM_description': '***', 'on_poweroff': 'destroy', 'on_crash': 'restart', 'guest_os_type': 'linux', 'name': '0004fb000006000046af0294f924ae23', 'builder': 'hvm', 'vcpus': 8, 'keymap': 'nl-be', 'OVM_os_type': 'Other Linux', 'OVM_cpu_compat_group': '', 'OVM_domain_type': 'xen_hvm_pv'})
[2018-08-29 09:59:40 11458] DEBUG (service:77) call complete: create_vm
[2018-08-29 09:59:40 11459] DEBUG (service:75) call start: storage_migration_setup(['/OVS/Repositories/0004fb00000300000fc1308166ad909e/VirtualDisks/0004fb000012000073e3a68455dcb5ea.img'],)
[2018-08-29 09:59:40 11459] DEBUG (service:77) call complete: storage_migration_setup
And nothing else.. The actual storage migration never starts and there is no timeout occuring or anything. This can stay like this for days..
When I abort the job this is added to the source ovs-agent.log:
[2018-08-29 10:28:13 5893] DEBUG (service:75) call start: list_vm('0004fb00000300007ee2b8ee807b26fe', '0004fb000006000046af0294f924ae23')
[2018-08-29 10:28:13 5893] DEBUG (service:77) call complete: list_vm
[2018-08-29 10:28:13 5896] DEBUG (service:75) call start: discover_repositories(' 0004fb00000300007ee2b8ee807b26fe ',)
[2018-08-29 10:28:14 5896] DEBUG (service:77) call complete: discover_repositories
[2018-08-29 10:28:14 5898] DEBUG (service:75) call start: get_repository_meta('0004fb00000300007ee2b8ee807b26fe',)
[2018-08-29 10:28:14 5898] DEBUG (service:77) call complete: get_repository_meta
[2018-08-29 10:28:14 5899] DEBUG (service:75) call start: get_vm_config('0004fb00000300007ee2b8ee807b26fe', '0004fb0000060000b92983900f9c3ab1')
[2018-08-29 10:28:14 5899] DEBUG (service:77) call complete: get_vm_config
[2018-08-29 10:28:14 5900] DEBUG (service:75) call start: get_vm_config('0004fb00000300007ee2b8ee807b26fe', '0004fb0000060000a2b8398eabdd0f01')
[2018-08-29 10:28:14 5900] DEBUG (service:77) call complete: get_vm_config
[2018-08-29 10:28:14 5901] DEBUG (service:75) call start: get_vm_config('0004fb00000300007ee2b8ee807b26fe', '0004fb000006000077d839ee568721ab')
[2018-08-29 10:28:14 5901] DEBUG (service:77) call complete: get_vm_config
[2018-08-29 10:28:14 5902] DEBUG (service:75) call start: get_vm_config('0004fb00000300007ee2b8ee807b26fe', '0004fb000006000046af0294f924ae23')
[2018-08-29 10:28:14 5902] DEBUG (service:77) call complete: get_vm_config
[2018-08-29 10:28:14 5903] DEBUG (service:75) call start: storage_plugin_list('oracle.ocfs2.OCFS2.OCFS2Plugin', {'status': '', 'admin_user': '', 'admin_host': '', 'uuid': '0004fb00000900008f68a4d47a7c2fdb', 'total_sz': 0, 'admin_passwd': '******', 'free_sz': 0, 'name': '0004fb00000900008f68a4d47a7c2fdb', 'access_host': '', 'storage_type': 'FileSys', 'alloc_sz': 0, 'access_grps': [], 'used_sz': 0, 'storage_desc': ''}, {'status': '', 'uuid': '0004fb00000500006943bb930a93191d', 'backing_device': '/dev/mapper/361866da082cd19001fd410b10fac42b7', 'ss_uuid': '0004fb00000900008f68a4d47a7c2fdb', 'free_sz': '1189785632768', 'name': 'fs on 361866da082cd19001fd410b10fac42b7', 'state': 2, 'access_grp_names': [], 'access_path': '/dev/mapper/361866da082cd19001fd410b10fac42b7', 'size': '1197759004672'}, {'fr_type': 'Directory', 'ondisk_sz': 0, 'fs_uuid': '0004fb00000500006943bb930a93191d', 'file_sz': 0, 'file_path': '/OVS/Repositories/0004fb00000300007ee2b8ee807b26fe'}, True)
[2018-08-29 10:28:14 5903] INFO (storageplugin:109) storage_plugin_list(oracle.ocfs2.OCFS2.OCFS2Plugin)
[2018-08-29 10:28:14 5903] DEBUG (service:77) call complete: storage_plugin_list
[2018-08-29 10:28:14 5907] DEBUG (service:75) call start: list_vm('0004fb00000300007ee2b8ee807b26fe', '0004fb000006000046af0294f924ae23')
[2018-08-29 10:28:15 5907] DEBUG (service:77) call complete: list_vm
so nothing special, and no errors..
On the destination ovs-agent:
[2018-08-29 10:28:16 20378] DEBUG (service:75) call start: list_vm('0004fb00000300000fc1308166ad909e', '0004fb000006000046af0294f924ae23')
[2018-08-29 10:28:16 20378] ERROR (service:97) catch_error: Command: ['xm', 'list', '--long', '0004fb000006000046af0294f924ae23'] failed (3): stderr: Error: Domain '0004fb000006000046af0294f924ae23' does not exist.
stdout:
Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/agent/lib/service.py", line 95, in wrapper
return func(*args)
File "/usr/lib64/python2.6/site-packages/agent/api/hypervisor/xenxm.py", line 293, in list_vm
return get_vm(vm_name)
File "/usr/lib64/python2.6/site-packages/agent/lib/xenxm.py", line 109, in get_vm
info = run_cmd(['xm', 'list', '--long', domain])
File "/usr/lib64/python2.6/site-packages/agent/lib/linux.py", line 77, in run_cmd
% (cmd, proc.returncode, stderrdata, stdoutdata))
RuntimeError: Command: ['xm', 'list', '--long', '0004fb000006000046af0294f924ae23'] failed (3): stderr: Error: Domain '0004fb000006000046af0294f924ae23' does not exist.
stdout:
[2018-08-29 10:28:17 20393] DEBUG (service:75) call start: storage_migration_cleanup(['/OVS/Repositories/0004fb00000300000fc1308166ad909e/VirtualDisks/0004fb000012000073e3a68455dcb5ea.img'], True)
[2018-08-29 10:28:17 20393] DEBUG (service:77) call complete: storage_migration_cleanup
[2018-08-29 10:28:17 20396] DEBUG (service:75) call start: storage_migration_delete_vm('0004fb00000300000fc1308166ad909e', '0004fb000006000046af0294f924ae23')
[2018-08-29 10:28:17 20396] DEBUG (service:77) call complete: storage_migration_delete_vm
[2018-08-29 10:28:17 20404] DEBUG (service:75) call start: storage_migration_cfgfile_cleanup('0004fb00000300000fc1308166ad909e', '0004fb000006000046af0294f924ae23')
[2018-08-29 10:28:17 20404] DEBUG (service:77) call complete: storage_migration_cfgfile_cleanup
So here I see that OVM seems to expect a migration vm that should have been created by the migration process but never has happened.
I have already tried restarting the ovs-agent on all OVSs and I also restarted the OVM Manager; but all stays the same...
I don't find anything on Oracle support or Google ..
Is there someone here that could shed some light onto this problem?