[libvirt] [Qemu-devel] [RFC 0/2] Attempt to implement the standby feature for assigned network devices

Discussion:

Michael Roth

2018-12-05 16:18:29 UTC

Quoting Sameeh Jubran (2018-10-25 13:01:10)

Hi all,
There has been a few attempts to implement the standby feature for vfio
assigned devices which aims to enable the migration of such devices. This
is another attempt.
The series implements an infrastructure for hiding devices from the bus
* In the first patch the infrastructure for hiding the device is added
for the qbus and qdev APIs. A "hidden" boolean is added to the device
state and it is set based on a callback to the standby device which
registers itself for handling the assessment: "should the primary device
be hidden?" by cross validating the ids of the devices.
* In the second patch the virtio-net uses the API to hide the vfio
device and unhides it when the feature is acked.
* I have only scratch tested this and from qemu side, it seems to be
working.
* This is an RFC so it lacks some proper error handling in few cases
and proper resource freeing. I wanted to get some feedback first
before it is finalized.
/home/sameeh/Builds/failover/qemu/x86_64-softmmu/qemu-system-x86_64 \
-netdev tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname=cc1_71 \
-netdev tap,vhost=on,id=hostnet1,script=world_bridge_standalone.sh,downscript=no,ifname=cc1_72,queues=4 \
-device virtio-net,host_mtu=1500,netdev=hostnet1,id=cc1_72,vectors=10,mq=on,primary=cc1_71 \
-device e1000,netdev=hostnet0,id=cc1_71,standby=cc1_72 \
Pre migration or during setup phase of the migration we should send an
unplug request to the guest to unplug the primary device. I haven't had
the chance to implement that part yet but should do soon. Do you know
what's the best approach to do so? I wanted to have a callback to the
virtio-net device which tries to send an unplug request to the guest and
if succeeds then the migration continues. It needs to handle the case where
the migration fails and then it has to replug the primary device back.

I think that the "add_migration_state_change_notifier" API call can be used
from within the virtio-net device to achieve this, what do you think?

I think it would be good to hear from the libvirt folks (on Cc:) on this as
having QEMU unplug a device without libvirt's involvement seems like it
could cause issues. Personally I think it seems cleaner to just have QEMU
handle the 'hidden' aspects of the device and leave it to QMP/libvirt to do
the unplug beforehand. On the libvirt side I could imagine adding an option
like virsh migrate --switch-to-standby-networking or something along
that line to do it automatically (if we decide doing it automatically is
even needed on that end).

standby - virtio-net
primary - vfio-device - physical device - assigned device
Please share your thoughts and suggestions,
Thanks!
qdev/qbus: Add hidden device support
virtio-net: Implement VIRTIO_NET_F_STANDBY feature
hw/core/qdev.c | 48 +++++++++++++++++++++++++---
hw/net/virtio-net.c | 25 +++++++++++++++
hw/pci/pci.c | 1 +
include/hw/pci/pci.h | 2 ++
include/hw/qdev-core.h | 11 ++++++-
include/hw/virtio/virtio-net.h | 5 +++
qdev-monitor.c | 58 ++++++++++++++++++++++++++++++++--
7 files changed, 142 insertions(+), 8 deletions(-)
--
2.17.0

--
Respectfully,
Sameeh Jubran
Linkedin

Peter Krempa

2018-12-05 17:09:16 UTC

Permalink

Post by Michael Roth
Quoting Sameeh Jubran (2018-10-25 13:01:10)

Pre migration or during setup phase of the migration we should send an
unplug request to the guest to unplug the primary device. I haven't had
the chance to implement that part yet but should do soon. Do you know
what's the best approach to do so? I wanted to have a callback to the
virtio-net device which tries to send an unplug request to the guest and
if succeeds then the migration continues. It needs to handle the case where
the migration fails and then it has to replug the primary device back.

I think that the "add_migration_state_change_notifier" API call can be used
from within the virtio-net device to achieve this, what do you think?

I remember talking about this approach some time ago.

In general the migration itself is a very complex process which has too
many places where it can fail. The same applies to device hotunplug.
This series proposes to merge those two together into an even more
complex behemoth.

Few scenarios which don't have clear solution come into my mind:
- Since unplug request time is actually unbounded. The guest OS may
arbitrarily reject it or execute it at any later time, migration may get
stuck in a halfway state without any clear rollback or failure scenario.

- After migration, device hotplug may fail for whatever reason, leaving
networking crippled and again no clear single-case rollback scenario.

Then there's stuff which requires libvirt/management cooperation
- picking of the network device on destination
- making sure that the device is present etc.

From managements point of view, bundling all this together is really not
a good idea since it creates a very big matrix of failure scenarios. In
general even libvirt will prefer that upper layer management drives this
externally, since any rolback scenario will result in a policy decision
of what to do in certain cases, and what timeouts to pick.

Michael S. Tsirkin

2018-12-05 17:22:18 UTC

Permalink

Post by Peter Krempa
From managements point of view, bundling all this together is really not
a good idea since it creates a very big matrix of failure scenarios.

I think this is clear. This is why we are doing it in QEMU where we can
actually do all the rollbacks transparently.

Post by Peter Krempa
In
general even libvirt will prefer that upper layer management drives this
externally, since any rolback scenario will result in a policy decision
of what to do in certain cases, and what timeouts to pick.

Architectural ugliness of implementing what is from users perspective a
mechanism and not a policy aside, experience teaches that this isn't
going to happen. People have been talking about the idea of doing
this at the upper layers for years.

--
MST

Daniel P. Berrangé

2018-12-05 17:26:34 UTC

Permalink

Post by Michael S. Tsirkin

Post by Peter Krempa
From managements point of view, bundling all this together is really not
a good idea since it creates a very big matrix of failure scenarios.

I think this is clear. This is why we are doing it in QEMU where we can
actually do all the rollbacks transparently.

The ability to unplugg+replug VFIO devices either side of migration
has existed in OpenStack for a long time. They also have metadata
that can be exposed to the guest to allow it to describe which pairs
of (emulated,vfio) devices should be used together.

Regards,
Daniel

--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Daniel P. Berrangé

2018-12-05 17:43:44 UTC

Permalink

Post by Peter Krempa

Post by Michael Roth
Quoting Sameeh Jubran (2018-10-25 13:01:10)

I think that the "add_migration_state_change_notifier" API call can be used
from within the virtio-net device to achieve this, what do you think?

IMHO this is the really big deal. Doing this inside QEMU can arbitrarily
delay the start of migration, but this is opaque to mgmt apps because it
all becomes hidden behind the migrate command. It is common for mgmt apps
to serialize migration operations, otherwise they compete for limited
network bandwidth making it less likely that any will complete. If we're
waiting for a guest OS to do the unplug though, we don't want to be
stopping other migrations from being started in the mean time. Having
the unplugs done from the mgmt app explicitly gives it the flexibility
to decide how to order and serialize things to suit its needs.

Post by Peter Krempa
- After migration, device hotplug may fail for whatever reason, leaving
networking crippled and again no clear single-case rollback scenario.

I'd say s/crippled/degraded/. Anyway depending on the the reason that
triggered the migration, you may not even want to rollback to the source
host, despite the VFIO hotplug failing on the target.

If the original host was being evacuated in order to upgrade it to the latest
security patches, or due to hardware problems, it can be preferrable to just
let the VM start running on the target host with just emulated NICs only and
worry about getting working VFIO later.

Post by Peter Krempa
Then there's stuff which requires libvirt/management cooperation
- picking of the network device on destination
- making sure that the device is present etc.
From managements point of view, bundling all this together is really not
a good idea since it creates a very big matrix of failure scenarios. In
general even libvirt will prefer that upper layer management drives this
externally, since any rolback scenario will result in a policy decision
of what to do in certain cases, and what timeouts to pick.

Indeed, leaving these policies decisions to the mgmt app has been a
better approach in general over time, as the view of what's the best
way to approach a problem has changed over time.

Regards,
Daniel