Discussion:
[libvirt] libxl driver does not notice domain killed with xl destroy
Marek Marczykowski-Górecki
2018-12-07 15:47:49 UTC
Permalink
Hi,

If one kills a domain with `xl destroy`, libvirt will not notice it and
still report the domain as running. Also trying to destroy the domain
through libvirt will fail. The only way to recover from such a situation
is to restart libvirt daemon.

While (I think) the right answer is "don't use xl destroy", libvirt
could do better.

libxl actually report such cases via LIBXL_EVENT_TYPE_DOMAIN_DEATH, but
it is ignored by libvirt's libxl driver. The problem is that
LIBXL_EVENT_TYPE_DOMAIN_DEATH is also reported when it's triggered by
libvirt (for example libxlDomainShutdownHandleDestroy, or
libxlDomainDestroyFlags).
For a proper solution I need a race-free way for checking if given
LIBXL_EVENT_TYPE_DOMAIN_DEATH was caused by libvirt (and is already
handled - including setting appropriate domain object state etc).

A bit of context:

https://github.com/libvirt/libvirt/blob/master/src/libxl/libxl_domain.c#L750-L765
libxlDomainDestroyInternal(libxlDriverPrivatePtr driver,
virDomainObjPtr vm)
{
libxlDriverConfigPtr cfg = libxlDriverConfigGet(driver);
int ret = -1;


/* Unlock virDomainObj during destroy, which can take considerable
* time on large memory domains.
*/
virObjectUnlock(vm);
ret = libxl_domain_destroy(cfg->ctx, vm->def->id, NULL);
virObjectLock(vm);


virObjectUnref(cfg);
return ret;
}

Example usage:
https://github.com/libvirt/libvirt/blob/master/src/libxl/libxl_driver.c#L1357-L1409
static int
libxlDomainDestroyFlags(virDomainPtr dom,
unsigned int flags)
{
(...)
if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0)
goto cleanup;


if (virDomainObjCheckActive(vm) < 0)
goto endjob;


if (libxlDomainDestroyInternal(driver, vm) < 0) {
virReportError(VIR_ERR_INTERNAL_ERROR,
_("Failed to destroy domain '%d'"), vm->def->id);
goto endjob;
}


virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF,
VIR_DOMAIN_SHUTOFF_DESTROYED);


event = virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_STOPPED,
VIR_DOMAIN_EVENT_STOPPED_DESTROYED);
(...)
}

It's tricky, because domain object lock is released for the
libxl_domain_destroy() call time and I think there is no guarantee that
the lock will be taken back by libxlDomainDestroyInternal() before
LIBXL_EVENT_TYPE_DOMAIN_DEATH is reported (and possibly handled). Which means
I can't use domain object state for checking if libvirt already know
about this domain death, because it is set to VIR_DOMAIN_SHUTOFF, after
libxlDomainDestroyInternal() call. And while setting domain
object state twice (once from LIBXL_EVENT_TYPE_DOMAIN_DEATH handler,
then from libxlDomainDestroyInternal() caller) wouldn't be that bad
(only ugly), but emitting domain lifecycle event twice would be a major
problem.

Any ideas? Some internal flag set by libxlDomainDestroyInternal() just
before libxl_domain_destroy() call? Where would be a place for such
thing?
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
Jim Fehlig
2018-12-08 00:10:12 UTC
Permalink
Post by Marek Marczykowski-Górecki
Hi,
If one kills a domain with `xl destroy`, libvirt will not notice it and
still report the domain as running. Also trying to destroy the domain
through libvirt will fail. The only way to recover from such a situation
is to restart libvirt daemon.
Ouch.
Post by Marek Marczykowski-Górecki
While (I think) the right answer is "don't use xl destroy", libvirt
could do better.
Nod. Restarting libvirtd is pretty drastic.
Post by Marek Marczykowski-Górecki
libxl actually report such cases via LIBXL_EVENT_TYPE_DOMAIN_DEATH, but
it is ignored by libvirt's libxl driver.
Why don't we handle that in libxlDomainEventHandler()?
Post by Marek Marczykowski-Górecki
The problem is that
LIBXL_EVENT_TYPE_DOMAIN_DEATH is also reported when it's triggered by
libvirt (for example libxlDomainShutdownHandleDestroy, or
libxlDomainDestroyFlags).
Ah, yes, now I remember why :-).
Post by Marek Marczykowski-Górecki
For a proper solution I need a race-free way for checking if given
LIBXL_EVENT_TYPE_DOMAIN_DEATH was caused by libvirt (and is already
handled - including setting appropriate domain object state etc).
https://github.com/libvirt/libvirt/blob/master/src/libxl/libxl_domain.c#L750-L765
libxlDomainDestroyInternal(libxlDriverPrivatePtr driver,
virDomainObjPtr vm)
{
libxlDriverConfigPtr cfg = libxlDriverConfigGet(driver);
int ret = -1;
/* Unlock virDomainObj during destroy, which can take considerable
* time on large memory domains.
*/
virObjectUnlock(vm);
ret = libxl_domain_destroy(cfg->ctx, vm->def->id, NULL);
virObjectLock(vm);
virObjectUnref(cfg);
return ret;
}
https://github.com/libvirt/libvirt/blob/master/src/libxl/libxl_driver.c#L1357-L1409
static int
libxlDomainDestroyFlags(virDomainPtr dom,
unsigned int flags)
{
(...)
if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0)
goto cleanup;
if (virDomainObjCheckActive(vm) < 0)
goto endjob;
if (libxlDomainDestroyInternal(driver, vm) < 0) {
virReportError(VIR_ERR_INTERNAL_ERROR,
_("Failed to destroy domain '%d'"), vm->def->id);
goto endjob;
}
virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF,
VIR_DOMAIN_SHUTOFF_DESTROYED);
event = virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_STOPPED,
VIR_DOMAIN_EVENT_STOPPED_DESTROYED);
(...)
}
It's tricky, because domain object lock is released for the
libxl_domain_destroy() call time and I think there is no guarantee that
the lock will be taken back by libxlDomainDestroyInternal() before
LIBXL_EVENT_TYPE_DOMAIN_DEATH is reported (and possibly handled). Which means
I can't use domain object state for checking if libvirt already know
about this domain death, because it is set to VIR_DOMAIN_SHUTOFF, after
libxlDomainDestroyInternal() call. And while setting domain
object state twice (once from LIBXL_EVENT_TYPE_DOMAIN_DEATH handler,
then from libxlDomainDestroyInternal() caller) wouldn't be that bad
(only ugly), but emitting domain lifecycle event twice would be a major
problem.
Yes, that is a problem.
Post by Marek Marczykowski-Górecki
Any ideas? Some internal flag set by libxlDomainDestroyInternal() just
before libxl_domain_destroy() call? Where would be a place for such
thing?
In the libxlDomainObjPrivate object, defined in libxl_domain.h?

Regards,
Jim

Loading...