Discussion:
[libvirt] [PATCH RFC v2] qemu: fix deadlock when waiting in non async jobs
Nikolay Shirokovskiy
2018-10-08 08:10:05 UTC
Permalink
Block job abort operation can not handle properly qemu crashes when waiting for
abort/pivot completion. Deadlock scenario is next:

- qemuDomainBlockJobAbort waits for pivot/abort completion
- qemu crashes, then qemuProcessBeginStopJob broadcasts for VM condition and
then waits for job condition (taken by qemuDomainBlockJobAbort)
- qemuDomainBlockJobAbort awakes but nothing really changed, VM is still
active (vm->def->id != -1) so thread starts waiting for completion again.
Now two threads are in deadlock.

First let's remove broadcast in qemuProcessBeginStopJob. It is simply wrong
because it is not set any condition before broadcast so that awaked threads can
not detect any changes. Crashing domain during async job will continue to be
handled properly because destroy job can run concurrently with async job and
destroy job calls qemuProcessStop which sets vm->def->id to -1 and broadcasts.

Second let's introduce flag that EOF is received and broadcast after that.
Now non async jobs can check this flag in wait loop.

Signed-off-by: Nikolay Shirokovskiy <***@virtuozzo.com>

---

Diff from v1:

- patches 1 and 2 are already merged
- don't bother with reporting monitor EOF reason to user as most of
time it is simply "unexpected eof" (this implies dropping patch 3)
- drop patch 5 as we now always report "domain is being stopped"
in qemuDomainObjWait
- don't signal on monitor error for simplicity (otherwise we need to report
something more elaborate that "domain is being stopped" as we don't
kill domain on monitor errors. On the other hand I guess monitor
error is rare case to handle it right now)
- keep virDomainObjWait for async jobs

It's a bit uneven that for async jobs domain is destroyed concurrently and for
non async jobs it will be actually destroyed after job get completed. Also if
non async job needs issuing commands to qemu on cleanup then we will send these
commands in vain polluting logs etc because qemu process in not running at this
moment but typical check (virDomainObjIsActive) will think it is still running.

Domain is destroyed (qemuProcessStop) in a job due to patches [1] and [2].
However AFAIU it is not neccessary. If qemuProcessStop does not drop VM lock
then we don't need extra job to make qemuProcessStop and main job not
interleave. And we can drop the lock now only in qemuDomainObjBeginNestedJob in
qemuProcessStop which is introduced in [2]. AFAIU we can fix issues mentioned in
[2] the other way for example like it is done for qemu agent - we save agent
monitor reference on stack for entering/exiting agent monitor.

So I wonder can we instead of this fix remove job for qemuProcessStop and run
destroying domain cuncurrently for non async jobs too.

[1]
commit 8c9ff9960b29d4703a99efdd1cadcf6f48799cc0
Author: Jiri Denemark <***@redhat.com>
Date: Thu Feb 11 15:32:48 2016 +0100

qemu: Process monitor EOF in a job

[2]
commit 81f50cb92d16643bcd749e3ab5b404b8b7cec643
Author: Jiri Denemark <***@redhat.com>
Date: Thu Feb 11 11:20:28 2016 +0100

qemu: Avoid calling qemuProcessStop without a job

src/qemu/qemu_domain.c | 39 +++++++++++++++++++++++++++++++++++++++
src/qemu/qemu_domain.h | 4 ++++
src/qemu/qemu_driver.c | 2 +-
src/qemu/qemu_hotplug.c | 4 ++--
src/qemu/qemu_process.c | 9 +++++----
5 files changed, 51 insertions(+), 7 deletions(-)

diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 939b2a3..aead72b 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -13534,3 +13534,42 @@ qemuDomainRunningReasonToResumeEvent(virDomainRunningReason reason)

return VIR_DOMAIN_EVENT_RESUMED_UNPAUSED;
}
+
+
+/**
+ * Waits for domain condition to be triggered for a specific period of time.
+ * if @until is 0 then waits indefinetely.
+ *
+ * Returns:
+ * -1 on error
+ * 0 on success
+ * 1 on timeout
+ */
+int
+qemuDomainObjWait(virDomainObjPtr vm, unsigned long long until)
+{
+ qemuDomainObjPrivatePtr priv = vm->privateData;
+ int rc;
+
+ if (until)
+ rc = virCondWaitUntil(&vm->cond, &vm->parent.lock, until);
+ else
+ rc = virCondWait(&vm->cond, &vm->parent.lock);
+
+ if (rc < 0) {
+ if (until && errno == ETIMEDOUT)
+ return 1;
+
+ virReportSystemError(errno, "%s",
+ _("failed to wait for domain condition"));
+ return -1;
+ }
+
+ if (priv->monEOF) {
+ virReportError(VIR_ERR_OPERATION_FAILED, "%s",
+ _("domain is being stopped"));
+ return -1;
+ }
+
+ return 0;
+}
diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h
index 2f8a1bf..36ab294 100644
--- a/src/qemu/qemu_domain.h
+++ b/src/qemu/qemu_domain.h
@@ -281,6 +281,7 @@ struct _qemuDomainObjPrivate {
virDomainChrSourceDefPtr monConfig;
bool monJSON;
bool monError;
+ bool monEOF;
unsigned long long monStart;

qemuAgentPtr agent;
@@ -1085,4 +1086,7 @@ void qemuDomainStorageIdReset(qemuDomainObjPrivatePtr priv);
virDomainEventResumedDetailType
qemuDomainRunningReasonToResumeEvent(virDomainRunningReason reason);

+int
+qemuDomainObjWait(virDomainObjPtr vm, unsigned long long until);
+
#endif /* __QEMU_DOMAIN_H__ */
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index b238309..f4250da 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -17142,7 +17142,7 @@ qemuDomainBlockJobAbort(virDomainPtr dom,
qemuDomainDiskPrivatePtr diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk);
qemuBlockJobUpdate(vm, QEMU_ASYNC_JOB_NONE, disk, NULL);
while (diskPriv->blockjob) {
- if (virDomainObjWait(vm) < 0) {
+ if (qemuDomainObjWait(vm, 0) < 0) {
ret = -1;
goto endjob;
}
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index 4558a3c..8189629 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -165,7 +165,7 @@ qemuHotplugWaitForTrayEject(virDomainObjPtr vm,
return -1;

while (disk->tray_status != VIR_DOMAIN_DISK_TRAY_OPEN) {
- if ((rc = virDomainObjWaitUntil(vm, now + CHANGE_MEDIA_TIMEOUT)) < 0)
+ if ((rc = qemuDomainObjWait(vm, now + CHANGE_MEDIA_TIMEOUT)) < 0)
return -1;

if (rc > 0) {
@@ -5002,7 +5002,7 @@ qemuDomainWaitForDeviceRemoval(virDomainObjPtr vm)
until += qemuDomainRemoveDeviceWaitTime;

while (priv->unplug.alias) {
- if ((rc = virDomainObjWaitUntil(vm, until)) == 1)
+ if ((rc = qemuDomainObjWait(vm, until)) == 1)
return 0;

if (rc < 0) {
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 29b0ba1..dd03269 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -290,9 +290,12 @@ qemuProcessHandleMonitorEOF(qemuMonitorPtr mon,

virObjectLock(vm);

+ priv = vm->privateData;
+ priv->monEOF = true;
+ virDomainObjBroadcast(vm);
+
VIR_DEBUG("Received EOF on %p '%s'", vm, vm->def->name);

- priv = vm->privateData;
if (priv->beingDestroyed) {
VIR_DEBUG("Domain is being destroyed, EOF is expected");
goto cleanup;
@@ -5996,6 +5999,7 @@ qemuProcessPrepareDomain(virQEMUDriverPtr driver,

priv->monJSON = true;
priv->monError = false;
+ priv->monEOF = false;
priv->monStart = 0;
priv->gotShutdown = false;
priv->runningReason = VIR_DOMAIN_RUNNING_UNKNOWN;
@@ -6965,9 +6969,6 @@ qemuProcessBeginStopJob(virQEMUDriverPtr driver,
if (qemuProcessKill(vm, killFlags) < 0)
goto cleanup;

- /* Wake up anything waiting on domain condition */
- virDomainObjBroadcast(vm);
-
if (qemuDomainObjBeginJob(driver, vm, job) < 0)
goto cleanup;
--
1.8.3.1
John Ferlan
2018-10-16 00:00:10 UTC
Permalink
Post by Nikolay Shirokovskiy
Block job abort operation can not handle properly qemu crashes when waiting for
- qemuDomainBlockJobAbort waits for pivot/abort completion
- qemu crashes, then qemuProcessBeginStopJob broadcasts for VM condition and
then waits for job condition (taken by qemuDomainBlockJobAbort)
- qemuDomainBlockJobAbort awakes but nothing really changed, VM is still
active (vm->def->id != -1) so thread starts waiting for completion again.
Now two threads are in deadlock.
First let's remove broadcast in qemuProcessBeginStopJob. It is simply wrong
because it is not set any condition before broadcast so that awaked threads can
not detect any changes. Crashing domain during async job will continue to be
handled properly because destroy job can run concurrently with async job and
destroy job calls qemuProcessStop which sets vm->def->id to -1 and broadcasts.
Hmm... Although blockjobs are not my area of expertise, I do seem to
have a knack for reading and commenting on patches with these edge
conditions.

At first, taken alone this made it seem like separate patches are
required, but maybe not depending on the relationship described above.
As an aside, for this paragraph hunk you could call out commit 4d0c535a3
where this is/was introduced. Beyond the refactor, the broadcast was
added; however, it seems it was done so on purpose since the broadcast
would seemingly allowing something to be awoken.

Beyond that - take away the scenario you describing where QEMU crashes.
In the normal path, if you remove the broadcast, then do things work
properly?

Since a block job would set @priv->blockjob when a block job starts and
is cleared during qemuBlockJobEventProcess processing when status is
VIR_DOMAIN_BLOCK_JOB_COMPLETED, VIR_DOMAIN_BLOCK_JOB_FAILED, or
VIR_DOMAIN_BLOCK_JOB_CANCELED.

What about setting a @priv->blockjobAbort when the abort starts. Then
perhaps processMonitorEOFEvent or qemuProcessHandleMonitorEOF can handle
that properly so that we don't deadlock.

Perhaps or hopefully, Jirka or Peter will comment too with this bump.

John
Post by Nikolay Shirokovskiy
Second let's introduce flag that EOF is received and broadcast after that.
Now non async jobs can check this flag in wait loop.
---
- patches 1 and 2 are already merged
- don't bother with reporting monitor EOF reason to user as most of
time it is simply "unexpected eof" (this implies dropping patch 3)
- drop patch 5 as we now always report "domain is being stopped"
in qemuDomainObjWait
- don't signal on monitor error for simplicity (otherwise we need to report
something more elaborate that "domain is being stopped" as we don't
kill domain on monitor errors. On the other hand I guess monitor
error is rare case to handle it right now)
- keep virDomainObjWait for async jobs
It's a bit uneven that for async jobs domain is destroyed concurrently and for
non async jobs it will be actually destroyed after job get completed. Also if
non async job needs issuing commands to qemu on cleanup then we will send these
commands in vain polluting logs etc because qemu process in not running at this
moment but typical check (virDomainObjIsActive) will think it is still running.
Domain is destroyed (qemuProcessStop) in a job due to patches [1] and [2].
However AFAIU it is not neccessary. If qemuProcessStop does not drop VM lock
then we don't need extra job to make qemuProcessStop and main job not
interleave. And we can drop the lock now only in qemuDomainObjBeginNestedJob in
qemuProcessStop which is introduced in [2]. AFAIU we can fix issues mentioned in
[2] the other way for example like it is done for qemu agent - we save agent
monitor reference on stack for entering/exiting agent monitor.
So I wonder can we instead of this fix remove job for qemuProcessStop and run
destroying domain cuncurrently for non async jobs too.
[1]
commit 8c9ff9960b29d4703a99efdd1cadcf6f48799cc0
Date: Thu Feb 11 15:32:48 2016 +0100
qemu: Process monitor EOF in a job
[2]
commit 81f50cb92d16643bcd749e3ab5b404b8b7cec643
Date: Thu Feb 11 11:20:28 2016 +0100
qemu: Avoid calling qemuProcessStop without a job
src/qemu/qemu_domain.c | 39 +++++++++++++++++++++++++++++++++++++++
src/qemu/qemu_domain.h | 4 ++++
src/qemu/qemu_driver.c | 2 +-
src/qemu/qemu_hotplug.c | 4 ++--
src/qemu/qemu_process.c | 9 +++++----
5 files changed, 51 insertions(+), 7 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 939b2a3..aead72b 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -13534,3 +13534,42 @@ qemuDomainRunningReasonToResumeEvent(virDomainRunningReason reason)
return VIR_DOMAIN_EVENT_RESUMED_UNPAUSED;
}
+
+
+/**
+ * Waits for domain condition to be triggered for a specific period of time.
+ *
+ * -1 on error
+ * 0 on success
+ * 1 on timeout
+ */
+int
+qemuDomainObjWait(virDomainObjPtr vm, unsigned long long until)
+{
+ qemuDomainObjPrivatePtr priv = vm->privateData;
+ int rc;
+
+ if (until)
+ rc = virCondWaitUntil(&vm->cond, &vm->parent.lock, until);
+ else
+ rc = virCondWait(&vm->cond, &vm->parent.lock);
+
+ if (rc < 0) {
+ if (until && errno == ETIMEDOUT)
+ return 1;
+
+ virReportSystemError(errno, "%s",
+ _("failed to wait for domain condition"));
+ return -1;
+ }
+
+ if (priv->monEOF) {
+ virReportError(VIR_ERR_OPERATION_FAILED, "%s",
+ _("domain is being stopped"));
+ return -1;
+ }
+
+ return 0;
+}
diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h
index 2f8a1bf..36ab294 100644
--- a/src/qemu/qemu_domain.h
+++ b/src/qemu/qemu_domain.h
@@ -281,6 +281,7 @@ struct _qemuDomainObjPrivate {
virDomainChrSourceDefPtr monConfig;
bool monJSON;
bool monError;
+ bool monEOF;
unsigned long long monStart;
qemuAgentPtr agent;
@@ -1085,4 +1086,7 @@ void qemuDomainStorageIdReset(qemuDomainObjPrivatePtr priv);
virDomainEventResumedDetailType
qemuDomainRunningReasonToResumeEvent(virDomainRunningReason reason);
+int
+qemuDomainObjWait(virDomainObjPtr vm, unsigned long long until);
+
#endif /* __QEMU_DOMAIN_H__ */
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index b238309..f4250da 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -17142,7 +17142,7 @@ qemuDomainBlockJobAbort(virDomainPtr dom,
qemuDomainDiskPrivatePtr diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk);
qemuBlockJobUpdate(vm, QEMU_ASYNC_JOB_NONE, disk, NULL);
while (diskPriv->blockjob) {
- if (virDomainObjWait(vm) < 0) {
+ if (qemuDomainObjWait(vm, 0) < 0) {
ret = -1;
goto endjob;
}
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index 4558a3c..8189629 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -165,7 +165,7 @@ qemuHotplugWaitForTrayEject(virDomainObjPtr vm,
return -1;
while (disk->tray_status != VIR_DOMAIN_DISK_TRAY_OPEN) {
- if ((rc = virDomainObjWaitUntil(vm, now + CHANGE_MEDIA_TIMEOUT)) < 0)
+ if ((rc = qemuDomainObjWait(vm, now + CHANGE_MEDIA_TIMEOUT)) < 0)
return -1;
if (rc > 0) {
@@ -5002,7 +5002,7 @@ qemuDomainWaitForDeviceRemoval(virDomainObjPtr vm)
until += qemuDomainRemoveDeviceWaitTime;
while (priv->unplug.alias) {
- if ((rc = virDomainObjWaitUntil(vm, until)) == 1)
+ if ((rc = qemuDomainObjWait(vm, until)) == 1)
return 0;
if (rc < 0) {
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 29b0ba1..dd03269 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -290,9 +290,12 @@ qemuProcessHandleMonitorEOF(qemuMonitorPtr mon,
virObjectLock(vm);
+ priv = vm->privateData;
+ priv->monEOF = true;
+ virDomainObjBroadcast(vm);
+
VIR_DEBUG("Received EOF on %p '%s'", vm, vm->def->name);
- priv = vm->privateData;
if (priv->beingDestroyed) {
VIR_DEBUG("Domain is being destroyed, EOF is expected");
goto cleanup;
@@ -5996,6 +5999,7 @@ qemuProcessPrepareDomain(virQEMUDriverPtr driver,
priv->monJSON = true;
priv->monError = false;
+ priv->monEOF = false;
priv->monStart = 0;
priv->gotShutdown = false;
priv->runningReason = VIR_DOMAIN_RUNNING_UNKNOWN;
@@ -6965,9 +6969,6 @@ qemuProcessBeginStopJob(virQEMUDriverPtr driver,
if (qemuProcessKill(vm, killFlags) < 0)
goto cleanup;
- /* Wake up anything waiting on domain condition */
- virDomainObjBroadcast(vm);
-
if (qemuDomainObjBeginJob(driver, vm, job) < 0)
goto cleanup;
Nikolay Shirokovskiy
2018-10-16 07:22:09 UTC
Permalink
Post by John Ferlan
Post by Nikolay Shirokovskiy
Block job abort operation can not handle properly qemu crashes when waiting for
- qemuDomainBlockJobAbort waits for pivot/abort completion
- qemu crashes, then qemuProcessBeginStopJob broadcasts for VM condition and
then waits for job condition (taken by qemuDomainBlockJobAbort)
- qemuDomainBlockJobAbort awakes but nothing really changed, VM is still
active (vm->def->id != -1) so thread starts waiting for completion again.
Now two threads are in deadlock.
First let's remove broadcast in qemuProcessBeginStopJob. It is simply wrong
because it is not set any condition before broadcast so that awaked threads can
not detect any changes. Crashing domain during async job will continue to be
handled properly because destroy job can run concurrently with async job and
destroy job calls qemuProcessStop which sets vm->def->id to -1 and broadcasts.
Hmm... Although blockjobs are not my area of expertise, I do seem to
have a knack for reading and commenting on patches with these edge
conditions.
At first, taken alone this made it seem like separate patches are
required, but maybe not depending on the relationship described above.
As an aside, for this paragraph hunk you could call out commit 4d0c535a3
where this is/was introduced. Beyond the refactor, the broadcast was
added; however, it seems it was done so on purpose since the broadcast
would seemingly allowing something to be awoken.
Beyond that - take away the scenario you describing where QEMU crashes.
In the normal path, if you remove the broadcast, then do things work
properly?
As far as I can see. In all jobs where we we wait on vm condition we
check misc state variables after that so if state is not changed than
broadcasting will not help. (The only exception is migration and derivatives
with qemu not supporting events but in this case we use sleeps and
do not wait).
Post by John Ferlan
is cleared during qemuBlockJobEventProcess processing when status is
VIR_DOMAIN_BLOCK_JOB_COMPLETED, VIR_DOMAIN_BLOCK_JOB_FAILED, or
VIR_DOMAIN_BLOCK_JOB_CANCELED.
perhaps processMonitorEOFEvent or qemuProcessHandleMonitorEOF can handle
that properly so that we don't deadlock.
But how we can handle it the other way? I see no other option now besides
setting some state variable and signalling after that to help non async
job to finish and then let EOF handler proceed. (However check suggestions after --- )

Nikolay
Post by John Ferlan
Perhaps or hopefully, Jirka or Peter will comment too with this bump.
John
Post by Nikolay Shirokovskiy
Second let's introduce flag that EOF is received and broadcast after that.
Now non async jobs can check this flag in wait loop.
---
- patches 1 and 2 are already merged
- don't bother with reporting monitor EOF reason to user as most of
time it is simply "unexpected eof" (this implies dropping patch 3)
- drop patch 5 as we now always report "domain is being stopped"
in qemuDomainObjWait
- don't signal on monitor error for simplicity (otherwise we need to report
something more elaborate that "domain is being stopped" as we don't
kill domain on monitor errors. On the other hand I guess monitor
error is rare case to handle it right now)
- keep virDomainObjWait for async jobs
It's a bit uneven that for async jobs domain is destroyed concurrently and for
non async jobs it will be actually destroyed after job get completed. Also if
non async job needs issuing commands to qemu on cleanup then we will send these
commands in vain polluting logs etc because qemu process in not running at this
moment but typical check (virDomainObjIsActive) will think it is still running.
Domain is destroyed (qemuProcessStop) in a job due to patches [1] and [2].
However AFAIU it is not neccessary. If qemuProcessStop does not drop VM lock
then we don't need extra job to make qemuProcessStop and main job not
interleave. And we can drop the lock now only in qemuDomainObjBeginNestedJob in
qemuProcessStop which is introduced in [2]. AFAIU we can fix issues mentioned in
[2] the other way for example like it is done for qemu agent - we save agent
monitor reference on stack for entering/exiting agent monitor.
So I wonder can we instead of this fix remove job for qemuProcessStop and run
destroying domain cuncurrently for non async jobs too.
[1]
commit 8c9ff9960b29d4703a99efdd1cadcf6f48799cc0
Date: Thu Feb 11 15:32:48 2016 +0100
qemu: Process monitor EOF in a job
[2]
commit 81f50cb92d16643bcd749e3ab5b404b8b7cec643
Date: Thu Feb 11 11:20:28 2016 +0100
qemu: Avoid calling qemuProcessStop without a job
src/qemu/qemu_domain.c | 39 +++++++++++++++++++++++++++++++++++++++
src/qemu/qemu_domain.h | 4 ++++
src/qemu/qemu_driver.c | 2 +-
src/qemu/qemu_hotplug.c | 4 ++--
src/qemu/qemu_process.c | 9 +++++----
5 files changed, 51 insertions(+), 7 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 939b2a3..aead72b 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -13534,3 +13534,42 @@ qemuDomainRunningReasonToResumeEvent(virDomainRunningReason reason)
return VIR_DOMAIN_EVENT_RESUMED_UNPAUSED;
}
+
+
+/**
+ * Waits for domain condition to be triggered for a specific period of time.
+ *
+ * -1 on error
+ * 0 on success
+ * 1 on timeout
+ */
+int
+qemuDomainObjWait(virDomainObjPtr vm, unsigned long long until)
+{
+ qemuDomainObjPrivatePtr priv = vm->privateData;
+ int rc;
+
+ if (until)
+ rc = virCondWaitUntil(&vm->cond, &vm->parent.lock, until);
+ else
+ rc = virCondWait(&vm->cond, &vm->parent.lock);
+
+ if (rc < 0) {
+ if (until && errno == ETIMEDOUT)
+ return 1;
+
+ virReportSystemError(errno, "%s",
+ _("failed to wait for domain condition"));
+ return -1;
+ }
+
+ if (priv->monEOF) {
+ virReportError(VIR_ERR_OPERATION_FAILED, "%s",
+ _("domain is being stopped"));
+ return -1;
+ }
+
+ return 0;
+}
diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h
index 2f8a1bf..36ab294 100644
--- a/src/qemu/qemu_domain.h
+++ b/src/qemu/qemu_domain.h
@@ -281,6 +281,7 @@ struct _qemuDomainObjPrivate {
virDomainChrSourceDefPtr monConfig;
bool monJSON;
bool monError;
+ bool monEOF;
unsigned long long monStart;
qemuAgentPtr agent;
@@ -1085,4 +1086,7 @@ void qemuDomainStorageIdReset(qemuDomainObjPrivatePtr priv);
virDomainEventResumedDetailType
qemuDomainRunningReasonToResumeEvent(virDomainRunningReason reason);
+int
+qemuDomainObjWait(virDomainObjPtr vm, unsigned long long until);
+
#endif /* __QEMU_DOMAIN_H__ */
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index b238309..f4250da 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -17142,7 +17142,7 @@ qemuDomainBlockJobAbort(virDomainPtr dom,
qemuDomainDiskPrivatePtr diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk);
qemuBlockJobUpdate(vm, QEMU_ASYNC_JOB_NONE, disk, NULL);
while (diskPriv->blockjob) {
- if (virDomainObjWait(vm) < 0) {
+ if (qemuDomainObjWait(vm, 0) < 0) {
ret = -1;
goto endjob;
}
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index 4558a3c..8189629 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -165,7 +165,7 @@ qemuHotplugWaitForTrayEject(virDomainObjPtr vm,
return -1;
while (disk->tray_status != VIR_DOMAIN_DISK_TRAY_OPEN) {
- if ((rc = virDomainObjWaitUntil(vm, now + CHANGE_MEDIA_TIMEOUT)) < 0)
+ if ((rc = qemuDomainObjWait(vm, now + CHANGE_MEDIA_TIMEOUT)) < 0)
return -1;
if (rc > 0) {
@@ -5002,7 +5002,7 @@ qemuDomainWaitForDeviceRemoval(virDomainObjPtr vm)
until += qemuDomainRemoveDeviceWaitTime;
while (priv->unplug.alias) {
- if ((rc = virDomainObjWaitUntil(vm, until)) == 1)
+ if ((rc = qemuDomainObjWait(vm, until)) == 1)
return 0;
if (rc < 0) {
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 29b0ba1..dd03269 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -290,9 +290,12 @@ qemuProcessHandleMonitorEOF(qemuMonitorPtr mon,
virObjectLock(vm);
+ priv = vm->privateData;
+ priv->monEOF = true;
+ virDomainObjBroadcast(vm);
+
VIR_DEBUG("Received EOF on %p '%s'", vm, vm->def->name);
- priv = vm->privateData;
if (priv->beingDestroyed) {
VIR_DEBUG("Domain is being destroyed, EOF is expected");
goto cleanup;
@@ -5996,6 +5999,7 @@ qemuProcessPrepareDomain(virQEMUDriverPtr driver,
priv->monJSON = true;
priv->monError = false;
+ priv->monEOF = false;
priv->monStart = 0;
priv->gotShutdown = false;
priv->runningReason = VIR_DOMAIN_RUNNING_UNKNOWN;
@@ -6965,9 +6969,6 @@ qemuProcessBeginStopJob(virQEMUDriverPtr driver,
if (qemuProcessKill(vm, killFlags) < 0)
goto cleanup;
- /* Wake up anything waiting on domain condition */
- virDomainObjBroadcast(vm);
-
if (qemuDomainObjBeginJob(driver, vm, job) < 0)
goto cleanup;
John Ferlan
2018-10-17 20:04:41 UTC
Permalink
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
Block job abort operation can not handle properly qemu crashes when waiting for
- qemuDomainBlockJobAbort waits for pivot/abort completion
- qemu crashes, then qemuProcessBeginStopJob broadcasts for VM condition and
then waits for job condition (taken by qemuDomainBlockJobAbort)
- qemuDomainBlockJobAbort awakes but nothing really changed, VM is still
active (vm->def->id != -1) so thread starts waiting for completion again.
Now two threads are in deadlock.
First let's remove broadcast in qemuProcessBeginStopJob. It is simply wrong
because it is not set any condition before broadcast so that awaked threads can
not detect any changes. Crashing domain during async job will continue to be
handled properly because destroy job can run concurrently with async job and
destroy job calls qemuProcessStop which sets vm->def->id to -1 and broadcasts.
Hmm... Although blockjobs are not my area of expertise, I do seem to
have a knack for reading and commenting on patches with these edge
conditions.
At first, taken alone this made it seem like separate patches are
required, but maybe not depending on the relationship described above.
As an aside, for this paragraph hunk you could call out commit 4d0c535a3
where this is/was introduced. Beyond the refactor, the broadcast was
added; however, it seems it was done so on purpose since the broadcast
would seemingly allowing something to be awoken.
Beyond that - take away the scenario you describing where QEMU crashes.
In the normal path, if you remove the broadcast, then do things work
properly?
As far as I can see. In all jobs where we we wait on vm condition we
check misc state variables after that so if state is not changed than
broadcasting will not help. (The only exception is migration and derivatives
with qemu not supporting events but in this case we use sleeps and
do not wait).
To be clear, you are referencing virDomainObjWait[Until] callers that
aren't found within qemu_migration.c.

The two qemu_hotplug.c examples are waiting for QEMU events related to
tray eject or device removal, but are limited in their duration. If they
don't get the event in the specified time they have their means to
signify the timeout.

The two qemu_driver.c examples are possibly waiting forever. One waits
for an external event to signify a memory dump is complete via
qemuProcessHandleDumpCompleted or the job aborted. The other waits for
the blockjob to be completed when qemuBlockJobEventProcess clear the
flag. Both should also theoretically fail when the domain or qemu dies;
however, since both use virDomainObjWait which when properly tickled by
broadcast will call virDomainObjIsActive to compare vm->def->id != -1.

So the contention (for both I think) is that because the = -1 is not
done when QEMU is killed we're stuck. So rather than wait on something
that won't happen - use the EOF event as a way to force exit once a
broadcast happens via a qemu_domain specific qemuDomainObjWait.

Is that a "fair summary"?
Post by Nikolay Shirokovskiy
Post by John Ferlan
is cleared during qemuBlockJobEventProcess processing when status is
VIR_DOMAIN_BLOCK_JOB_COMPLETED, VIR_DOMAIN_BLOCK_JOB_FAILED, or
VIR_DOMAIN_BLOCK_JOB_CANCELED.
perhaps processMonitorEOFEvent or qemuProcessHandleMonitorEOF can handle
that properly so that we don't deadlock.
But how we can handle it the other way? I see no other option now besides
setting some state variable and signalling after that to help non async
job to finish and then let EOF handler proceed. (However check suggestions after --- )
OK, so perhaps that's just a rename of your @monEOF, but specific to the
example, but based on what I typed above would seemingly not be enough,
so let's stick with monEOF...
Post by Nikolay Shirokovskiy
Nikolay
Post by John Ferlan
Perhaps or hopefully, Jirka or Peter will comment too with this bump.
John
Post by Nikolay Shirokovskiy
Second let's introduce flag that EOF is received and broadcast after that.
Now non async jobs can check this flag in wait loop.
---
Just making sure - this RFC v2 comes from a series in Apr/May of this year:

https://www.redhat.com/archives/libvir-list/2018-April/msg01752.html

w/ review dialog spilling into May:
https://www.redhat.com/archives/libvir-list/2018-May/msg00126.html

based on the series :

https://www.redhat.com/archives/libvir-list/2018-April/msg01713.html

that you SNACK'd.

An awful long time to remember context! When you reference earlier
patches, please try to remember to place a link in the cover - it makes
it easier to find. Even if it's only a couple of days. Weeks and months
from now someone may reference the series and want to peruse the history
of the previous review comments.
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
- patches 1 and 2 are already merged
- don't bother with reporting monitor EOF reason to user as most of
time it is simply "unexpected eof" (this implies dropping patch 3)
- drop patch 5 as we now always report "domain is being stopped"
in qemuDomainObjWait
- don't signal on monitor error for simplicity (otherwise we need to report
something more elaborate that "domain is being stopped" as we don't
kill domain on monitor errors. On the other hand I guess monitor
error is rare case to handle it right now)
- keep virDomainObjWait for async jobs
It's a bit uneven that for async jobs domain is destroyed concurrently and for
non async jobs it will be actually destroyed after job get completed. Also if
non async job needs issuing commands to qemu on cleanup then we will send these
commands in vain polluting logs etc because qemu process in not running at this
moment but typical check (virDomainObjIsActive) will think it is still running.
Domain is destroyed (qemuProcessStop) in a job due to patches [1] and [2].
However AFAIU it is not neccessary. If qemuProcessStop does not drop VM lock
then we don't need extra job to make qemuProcessStop and main job not
interleave. And we can drop the lock now only in qemuDomainObjBeginNestedJob in
qemuProcessStop which is introduced in [2]. AFAIU we can fix issues mentioned in
[2] the other way for example like it is done for qemu agent - we save agent
monitor reference on stack for entering/exiting agent monitor.
So I wonder can we instead of this fix remove job for qemuProcessStop and run
destroying domain cuncurrently for non async jobs too.
[1]
commit 8c9ff9960b29d4703a99efdd1cadcf6f48799cc0
Date: Thu Feb 11 15:32:48 2016 +0100
qemu: Process monitor EOF in a job
[2]
commit 81f50cb92d16643bcd749e3ab5b404b8b7cec643
Date: Thu Feb 11 11:20:28 2016 +0100
qemu: Avoid calling qemuProcessStop without a job
src/qemu/qemu_domain.c | 39 +++++++++++++++++++++++++++++++++++++++
src/qemu/qemu_domain.h | 4 ++++
src/qemu/qemu_driver.c | 2 +-
src/qemu/qemu_hotplug.c | 4 ++--
src/qemu/qemu_process.c | 9 +++++----
5 files changed, 51 insertions(+), 7 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 939b2a3..aead72b 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -13534,3 +13534,42 @@ qemuDomainRunningReasonToResumeEvent(virDomainRunningReason reason)
return VIR_DOMAIN_EVENT_RESUMED_UNPAUSED;
}
+
+
+/**
+ * Waits for domain condition to be triggered for a specific period of time.
*indefinitely
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
+ *
+ * -1 on error
+ * 0 on success
+ * 1 on timeout
+ */
+int
+qemuDomainObjWait(virDomainObjPtr vm, unsigned long long until)
Each argument on it's own line.
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
+{
+ qemuDomainObjPrivatePtr priv = vm->privateData;
+ int rc;
+
+ if (until)
+ rc = virCondWaitUntil(&vm->cond, &vm->parent.lock, until);
+ else
+ rc = virCondWait(&vm->cond, &vm->parent.lock);
+
+ if (rc < 0) {
+ if (until && errno == ETIMEDOUT)
+ return 1;
+
+ virReportSystemError(errno, "%s",
+ _("failed to wait for domain condition"));
+ return -1;
+ }
+
+ if (priv->monEOF) {
+ virReportError(VIR_ERR_OPERATION_FAILED, "%s",
+ _("domain is being stopped"));
"monitor has been closed"
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
+ return -1;
+ }
No chance that vm->def->id could be set to -1 during any of this, right?
Perhaps it doesn't hurt to check virDomainObjIsActive too. Paranoia,
you know...
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
+
+ return 0;
+}
diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h
index 2f8a1bf..36ab294 100644
--- a/src/qemu/qemu_domain.h
+++ b/src/qemu/qemu_domain.h
@@ -281,6 +281,7 @@ struct _qemuDomainObjPrivate {
virDomainChrSourceDefPtr monConfig;
bool monJSON;
bool monError;
+ bool monEOF;
unsigned long long monStart;
qemuAgentPtr agent;
@@ -1085,4 +1086,7 @@ void qemuDomainStorageIdReset(qemuDomainObjPrivatePtr priv);
virDomainEventResumedDetailType
qemuDomainRunningReasonToResumeEvent(virDomainRunningReason reason);
+int
+qemuDomainObjWait(virDomainObjPtr vm, unsigned long long until);
+
#endif /* __QEMU_DOMAIN_H__ */
All of the above can be it's own patch to "qemu: Introduce
qemuDomainObjWait" with a commit message to describe why this would be
used instead of the virDomainObjWait especially considering the QEMU
events and the inability to get the exit criteria set by
virDomainObjIsActive of vm->def->id != -1.
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index b238309..f4250da 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -17142,7 +17142,7 @@ qemuDomainBlockJobAbort(virDomainPtr dom,
qemuDomainDiskPrivatePtr diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk);
qemuBlockJobUpdate(vm, QEMU_ASYNC_JOB_NONE, disk, NULL);
while (diskPriv->blockjob) {
- if (virDomainObjWait(vm) < 0) {
+ if (qemuDomainObjWait(vm, 0) < 0) {
ret = -1;
goto endjob;
}
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index 4558a3c..8189629 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -165,7 +165,7 @@ qemuHotplugWaitForTrayEject(virDomainObjPtr vm,
return -1;
while (disk->tray_status != VIR_DOMAIN_DISK_TRAY_OPEN) {
- if ((rc = virDomainObjWaitUntil(vm, now + CHANGE_MEDIA_TIMEOUT)) < 0)
+ if ((rc = qemuDomainObjWait(vm, now + CHANGE_MEDIA_TIMEOUT)) < 0)
return -1;
if (rc > 0) {
@@ -5002,7 +5002,7 @@ qemuDomainWaitForDeviceRemoval(virDomainObjPtr vm)
until += qemuDomainRemoveDeviceWaitTime;
while (priv->unplug.alias) {
- if ((rc = virDomainObjWaitUntil(vm, until)) == 1)
+ if ((rc = qemuDomainObjWait(vm, until)) == 1)
return 0;
if (rc < 0) {
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 29b0ba1..dd03269 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -290,9 +290,12 @@ qemuProcessHandleMonitorEOF(qemuMonitorPtr mon,
virObjectLock(vm);
+ priv = vm->privateData;
We could just initialize this at the top since we don't even check !vm
first anyway.
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
+ priv->monEOF = true;
+ virDomainObjBroadcast(vm);
So why Broadcast here? Why not just let processMonitorEOFEvent and the
call to qemuProcessBeginStopJob handle this?

Also, let's place this after the following DEBUG message - nice to keep
those up as high as possible.
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
+
VIR_DEBUG("Received EOF on %p '%s'", vm, vm->def->name);
- priv = vm->privateData;
if (priv->beingDestroyed) {
VIR_DEBUG("Domain is being destroyed, EOF is expected");
goto cleanup;
@@ -5996,6 +5999,7 @@ qemuProcessPrepareDomain(virQEMUDriverPtr driver,
priv->monJSON = true;
priv->monError = false;
+ priv->monEOF = false;
priv->monStart = 0;
priv->gotShutdown = false;
priv->runningReason = VIR_DOMAIN_RUNNING_UNKNOWN;
@@ -6965,9 +6969,6 @@ qemuProcessBeginStopJob(virQEMUDriverPtr driver,
if (qemuProcessKill(vm, killFlags) < 0)
goto cleanup;
- /* Wake up anything waiting on domain condition */
- virDomainObjBroadcast(vm);
-
Since this is called by more than just processMonitorEOFEvent, I would
think removing it could cause issues for qemuProcessAutoDestroy and
qemuDomainDestroyFlags. What would cause them to have their possibly
blocked blockjob or external memory dump to be notified of this event?

John
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
if (qemuDomainObjBeginJob(driver, vm, job) < 0)
goto cleanup;
Nikolay Shirokovskiy
2018-10-18 08:55:44 UTC
Permalink
Post by John Ferlan
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
Block job abort operation can not handle properly qemu crashes when waiting for
- qemuDomainBlockJobAbort waits for pivot/abort completion
- qemu crashes, then qemuProcessBeginStopJob broadcasts for VM condition and
then waits for job condition (taken by qemuDomainBlockJobAbort)
- qemuDomainBlockJobAbort awakes but nothing really changed, VM is still
active (vm->def->id != -1) so thread starts waiting for completion again.
Now two threads are in deadlock.
First let's remove broadcast in qemuProcessBeginStopJob. It is simply wrong
because it is not set any condition before broadcast so that awaked threads can
not detect any changes. Crashing domain during async job will continue to be
handled properly because destroy job can run concurrently with async job and
destroy job calls qemuProcessStop which sets vm->def->id to -1 and broadcasts.
Hmm... Although blockjobs are not my area of expertise, I do seem to
have a knack for reading and commenting on patches with these edge
conditions.
At first, taken alone this made it seem like separate patches are
required, but maybe not depending on the relationship described above.
As an aside, for this paragraph hunk you could call out commit 4d0c535a3
where this is/was introduced. Beyond the refactor, the broadcast was
added; however, it seems it was done so on purpose since the broadcast
would seemingly allowing something to be awoken.
Beyond that - take away the scenario you describing where QEMU crashes.
In the normal path, if you remove the broadcast, then do things work
properly?
As far as I can see. In all jobs where we we wait on vm condition we
check misc state variables after that so if state is not changed than
broadcasting will not help. (The only exception is migration and derivatives
with qemu not supporting events but in this case we use sleeps and
do not wait).
To be clear, you are referencing virDomainObjWait[Until] callers that
aren't found within qemu_migration.c.
Not exactly. I mean look at any code that waits with virDomainObjWait[Until].
That removed broadcast won't help them to finish waiting because the only
action they take after awake is checking state variable. They don't for
example send monitor commands in which case they would get "monitor closed" error
and finishi waiting.
Post by John Ferlan
The two qemu_hotplug.c examples are waiting for QEMU events related to
tray eject or device removal, but are limited in their duration. If they
don't get the event in the specified time they have their means to
signify the timeout.
The two qemu_driver.c examples are possibly waiting forever. One waits
for an external event to signify a memory dump is complete via
qemuProcessHandleDumpCompleted or the job aborted. The other waits for
the blockjob to be completed when qemuBlockJobEventProcess clear the
flag. Both should also theoretically fail when the domain or qemu dies;
however, since both use virDomainObjWait which when properly tickled by
broadcast will call virDomainObjIsActive to compare vm->def->id != -1.
So the contention (for both I think) is that because the = -1 is not
done when QEMU is killed we're stuck. So rather than wait on something
that won't happen - use the EOF event as a way to force exit once a
broadcast happens via a qemu_domain specific qemuDomainObjWait.
Is that a "fair summary"?
Yes. But this approach is taken only for non async jobs. Async jobs are
ok because they can run concurrently with destroy job.
Post by John Ferlan
Post by Nikolay Shirokovskiy
Post by John Ferlan
is cleared during qemuBlockJobEventProcess processing when status is
VIR_DOMAIN_BLOCK_JOB_COMPLETED, VIR_DOMAIN_BLOCK_JOB_FAILED, or
VIR_DOMAIN_BLOCK_JOB_CANCELED.
perhaps processMonitorEOFEvent or qemuProcessHandleMonitorEOF can handle
that properly so that we don't deadlock.
But how we can handle it the other way? I see no other option now besides
setting some state variable and signalling after that to help non async
job to finish and then let EOF handler proceed. (However check suggestions after --- )
example, but based on what I typed above would seemingly not be enough,
so let's stick with monEOF...
Post by Nikolay Shirokovskiy
Nikolay
Post by John Ferlan
Perhaps or hopefully, Jirka or Peter will comment too with this bump.
John
Post by Nikolay Shirokovskiy
Second let's introduce flag that EOF is received and broadcast after that.
Now non async jobs can check this flag in wait loop.
---
https://www.redhat.com/archives/libvir-list/2018-April/msg01752.html
https://www.redhat.com/archives/libvir-list/2018-May/msg00126.html
https://www.redhat.com/archives/libvir-list/2018-April/msg01713.html
that you SNACK'd.
An awful long time to remember context! When you reference earlier
patches, please try to remember to place a link in the cover - it makes
it easier to find. Even if it's only a couple of days. Weeks and months
from now someone may reference the series and want to peruse the history
of the previous review comments.
I will.
Post by John Ferlan
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
- patches 1 and 2 are already merged
- don't bother with reporting monitor EOF reason to user as most of
time it is simply "unexpected eof" (this implies dropping patch 3)
- drop patch 5 as we now always report "domain is being stopped"
in qemuDomainObjWait
- don't signal on monitor error for simplicity (otherwise we need to report
something more elaborate that "domain is being stopped" as we don't
kill domain on monitor errors. On the other hand I guess monitor
error is rare case to handle it right now)
- keep virDomainObjWait for async jobs
It's a bit uneven that for async jobs domain is destroyed concurrently and for
non async jobs it will be actually destroyed after job get completed. Also if
non async job needs issuing commands to qemu on cleanup then we will send these
commands in vain polluting logs etc because qemu process in not running at this
moment but typical check (virDomainObjIsActive) will think it is still running.
Domain is destroyed (qemuProcessStop) in a job due to patches [1] and [2].
However AFAIU it is not neccessary. If qemuProcessStop does not drop VM lock
then we don't need extra job to make qemuProcessStop and main job not
interleave. And we can drop the lock now only in qemuDomainObjBeginNestedJob in
qemuProcessStop which is introduced in [2]. AFAIU we can fix issues mentioned in
[2] the other way for example like it is done for qemu agent - we save agent
monitor reference on stack for entering/exiting agent monitor.
So I wonder can we instead of this fix remove job for qemuProcessStop and run
destroying domain cuncurrently for non async jobs too.
[1]
commit 8c9ff9960b29d4703a99efdd1cadcf6f48799cc0
Date: Thu Feb 11 15:32:48 2016 +0100
qemu: Process monitor EOF in a job
[2]
commit 81f50cb92d16643bcd749e3ab5b404b8b7cec643
Date: Thu Feb 11 11:20:28 2016 +0100
qemu: Avoid calling qemuProcessStop without a job
src/qemu/qemu_domain.c | 39 +++++++++++++++++++++++++++++++++++++++
src/qemu/qemu_domain.h | 4 ++++
src/qemu/qemu_driver.c | 2 +-
src/qemu/qemu_hotplug.c | 4 ++--
src/qemu/qemu_process.c | 9 +++++----
5 files changed, 51 insertions(+), 7 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 939b2a3..aead72b 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -13534,3 +13534,42 @@ qemuDomainRunningReasonToResumeEvent(virDomainRunningReason reason)
return VIR_DOMAIN_EVENT_RESUMED_UNPAUSED;
}
+
+
+/**
+ * Waits for domain condition to be triggered for a specific period of time.
*indefinitely
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
+ *
+ * -1 on error
+ * 0 on success
+ * 1 on timeout
+ */
+int
+qemuDomainObjWait(virDomainObjPtr vm, unsigned long long until)
Each argument on it's own line.
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
+{
+ qemuDomainObjPrivatePtr priv = vm->privateData;
+ int rc;
+
+ if (until)
+ rc = virCondWaitUntil(&vm->cond, &vm->parent.lock, until);
+ else
+ rc = virCondWait(&vm->cond, &vm->parent.lock);
+
+ if (rc < 0) {
+ if (until && errno == ETIMEDOUT)
+ return 1;
+
+ virReportSystemError(errno, "%s",
+ _("failed to wait for domain condition"));
+ return -1;
+ }
+
+ if (priv->monEOF) {
+ virReportError(VIR_ERR_OPERATION_FAILED, "%s",
+ _("domain is being stopped"));
"monitor has been closed"
Well I thought user is unaware of concept of monitor. We do have commands
like qemu-monitor-command but they are targeted for devs/devops I guess.
Nevertheless original message a bit confusing too (being stopped???) :)
Post by John Ferlan
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
+ return -1;
+ }
No chance that vm->def->id could be set to -1 during any of this, right?
Perhaps it doesn't hurt to check virDomainObjIsActive too. Paranoia,
you know...
I'm afraid this can become dead code that we will resist to touch
as we don't understand why it exists) I see no possibilities why
we miss monEOF but hit virDomainObjIsActive now. Anyway before
this patch we would get deadlock anyway so it won't get any worse
if we don't add the check you suggest. If we really find situations
when it will be useful then we will do it immediately.
Post by John Ferlan
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
+
+ return 0;
+}
diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h
index 2f8a1bf..36ab294 100644
--- a/src/qemu/qemu_domain.h
+++ b/src/qemu/qemu_domain.h
@@ -281,6 +281,7 @@ struct _qemuDomainObjPrivate {
virDomainChrSourceDefPtr monConfig;
bool monJSON;
bool monError;
+ bool monEOF;
unsigned long long monStart;
qemuAgentPtr agent;
@@ -1085,4 +1086,7 @@ void qemuDomainStorageIdReset(qemuDomainObjPrivatePtr priv);
virDomainEventResumedDetailType
qemuDomainRunningReasonToResumeEvent(virDomainRunningReason reason);
+int
+qemuDomainObjWait(virDomainObjPtr vm, unsigned long long until);
+
#endif /* __QEMU_DOMAIN_H__ */
All of the above can be it's own patch to "qemu: Introduce
qemuDomainObjWait" with a commit message to describe why this would be
used instead of the virDomainObjWait especially considering the QEMU
events and the inability to get the exit criteria set by
virDomainObjIsActive of vm->def->id != -1.
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
Post by Nikolay Shirokovskiy
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index b238309..f4250da 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -17142,7 +17142,7 @@ qemuDomainBlockJobAbort(virDomainPtr dom,
qemuDomainDiskPrivatePtr diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk);
qemuBlockJobUpdate(vm, QEMU_ASYNC_JOB_NONE, disk, NULL);
while (diskPriv->blockjob) {
- if (virDomainObjWait(vm) < 0) {
+ if (qemuDomainObjWait(vm, 0) < 0) {
ret = -1;
goto endjob;
}
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index 4558a3c..8189629 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -165,7 +165,7 @@ qemuHotplugWaitForTrayEject(virDomainObjPtr vm,
return -1;
while (disk->tray_status != VIR_DOMAIN_DISK_TRAY_OPEN) {
- if ((rc = virDomainObjWaitUntil(vm, now + CHANGE_MEDIA_TIMEOUT)) < 0)
+ if ((rc = qemuDomainObjWait(vm, now + CHANGE_MEDIA_TIMEOUT)) < 0)
return -1;
if (rc > 0) {
@@ -5002,7 +5002,7 @@ qemuDomainWaitForDeviceRemoval(virDomainObjPtr vm)
until += qemuDomainRemoveDeviceWaitTime;
while (priv->unplug.alias) {
- if ((rc = virDomainObjWaitUntil(vm, until)) == 1)
+ if ((rc = qemuDomainObjWait(vm, until)) == 1)
return 0;
if (rc < 0) {
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 29b0ba1..dd03269 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -290,9 +290,12 @@ qemuProcessHandleMonitorEOF(qemuMonitorPtr mon,
virObjectLock(vm);
+ priv = vm->privateData;
We could just initialize this at the top since we don't even check !vm
first anyway.
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
+ priv->monEOF = true;
+ virDomainObjBroadcast(vm);
So why Broadcast here? Why not just let processMonitorEOFEvent and the
call to qemuProcessBeginStopJob handle this?
Because qemuProcessBeginStopJob can not begin job, it is occupied by waiter.
Post by John Ferlan
Also, let's place this after the following DEBUG message - nice to keep
those up as high as possible.
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
+
VIR_DEBUG("Received EOF on %p '%s'", vm, vm->def->name);
- priv = vm->privateData;
if (priv->beingDestroyed) {
VIR_DEBUG("Domain is being destroyed, EOF is expected");
goto cleanup;
@@ -5996,6 +5999,7 @@ qemuProcessPrepareDomain(virQEMUDriverPtr driver,
priv->monJSON = true;
priv->monError = false;
+ priv->monEOF = false;
priv->monStart = 0;
priv->gotShutdown = false;
priv->runningReason = VIR_DOMAIN_RUNNING_UNKNOWN;
@@ -6965,9 +6969,6 @@ qemuProcessBeginStopJob(virQEMUDriverPtr driver,
if (qemuProcessKill(vm, killFlags) < 0)
goto cleanup;
- /* Wake up anything waiting on domain condition */
- virDomainObjBroadcast(vm);
-
Since this is called by more than just processMonitorEOFEvent, I would
think removing it could cause issues for qemuProcessAutoDestroy and
qemuDomainDestroyFlags. What would cause them to have their possibly
blocked blockjob or external memory dump to be notified of this event?
Removing won't hurt. Autodestroy and destroy both first kill qemu and
then try to acquire job. If async job is active then they just aquire
destroy job immediately because async job let destroy run cuncurrently.
If non async job is active then we get EOF first which after this patch let
not async job finish and then destroy/autodestroy can proceed with stopping
the domain. The removed broadcast just makes spurious wakeup of waiter
I think.

I would also like to discuss the alternative approach described just before
patch diff stats. It looks more appopriate to me but we need a comment
from Jiri I guess to follow it.

Nikolay
Post by John Ferlan
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
if (qemuDomainObjBeginJob(driver, vm, job) < 0)
goto cleanup;
Nikolay Shirokovskiy
2018-12-05 12:19:49 UTC
Permalink
ping
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
Block job abort operation can not handle properly qemu crashes when waiting for
- qemuDomainBlockJobAbort waits for pivot/abort completion
- qemu crashes, then qemuProcessBeginStopJob broadcasts for VM condition and
then waits for job condition (taken by qemuDomainBlockJobAbort)
- qemuDomainBlockJobAbort awakes but nothing really changed, VM is still
active (vm->def->id != -1) so thread starts waiting for completion again.
Now two threads are in deadlock.
First let's remove broadcast in qemuProcessBeginStopJob. It is simply wrong
because it is not set any condition before broadcast so that awaked threads can
not detect any changes. Crashing domain during async job will continue to be
handled properly because destroy job can run concurrently with async job and
destroy job calls qemuProcessStop which sets vm->def->id to -1 and broadcasts.
Hmm... Although blockjobs are not my area of expertise, I do seem to
have a knack for reading and commenting on patches with these edge
conditions.
At first, taken alone this made it seem like separate patches are
required, but maybe not depending on the relationship described above.
As an aside, for this paragraph hunk you could call out commit 4d0c535a3
where this is/was introduced. Beyond the refactor, the broadcast was
added; however, it seems it was done so on purpose since the broadcast
would seemingly allowing something to be awoken.
Beyond that - take away the scenario you describing where QEMU crashes.
In the normal path, if you remove the broadcast, then do things work
properly?
As far as I can see. In all jobs where we we wait on vm condition we
check misc state variables after that so if state is not changed than
broadcasting will not help. (The only exception is migration and derivatives
with qemu not supporting events but in this case we use sleeps and
do not wait).
To be clear, you are referencing virDomainObjWait[Until] callers that
aren't found within qemu_migration.c.
Not exactly. I mean look at any code that waits with virDomainObjWait[Until].
That removed broadcast won't help them to finish waiting because the only
action they take after awake is checking state variable. They don't for
example send monitor commands in which case they would get "monitor closed" error
and finishi waiting.
Post by John Ferlan
The two qemu_hotplug.c examples are waiting for QEMU events related to
tray eject or device removal, but are limited in their duration. If they
don't get the event in the specified time they have their means to
signify the timeout.
The two qemu_driver.c examples are possibly waiting forever. One waits
for an external event to signify a memory dump is complete via
qemuProcessHandleDumpCompleted or the job aborted. The other waits for
the blockjob to be completed when qemuBlockJobEventProcess clear the
flag. Both should also theoretically fail when the domain or qemu dies;
however, since both use virDomainObjWait which when properly tickled by
broadcast will call virDomainObjIsActive to compare vm->def->id != -1.
So the contention (for both I think) is that because the = -1 is not
done when QEMU is killed we're stuck. So rather than wait on something
that won't happen - use the EOF event as a way to force exit once a
broadcast happens via a qemu_domain specific qemuDomainObjWait.
Is that a "fair summary"?
Yes. But this approach is taken only for non async jobs. Async jobs are
ok because they can run concurrently with destroy job.
Post by John Ferlan
Post by Nikolay Shirokovskiy
Post by John Ferlan
is cleared during qemuBlockJobEventProcess processing when status is
VIR_DOMAIN_BLOCK_JOB_COMPLETED, VIR_DOMAIN_BLOCK_JOB_FAILED, or
VIR_DOMAIN_BLOCK_JOB_CANCELED.
perhaps processMonitorEOFEvent or qemuProcessHandleMonitorEOF can handle
that properly so that we don't deadlock.
But how we can handle it the other way? I see no other option now besides
setting some state variable and signalling after that to help non async
job to finish and then let EOF handler proceed. (However check suggestions after --- )
example, but based on what I typed above would seemingly not be enough,
so let's stick with monEOF...
Post by Nikolay Shirokovskiy
Nikolay
Post by John Ferlan
Perhaps or hopefully, Jirka or Peter will comment too with this bump.
John
Post by Nikolay Shirokovskiy
Second let's introduce flag that EOF is received and broadcast after that.
Now non async jobs can check this flag in wait loop.
---
https://www.redhat.com/archives/libvir-list/2018-April/msg01752.html
https://www.redhat.com/archives/libvir-list/2018-May/msg00126.html
https://www.redhat.com/archives/libvir-list/2018-April/msg01713.html
that you SNACK'd.
An awful long time to remember context! When you reference earlier
patches, please try to remember to place a link in the cover - it makes
it easier to find. Even if it's only a couple of days. Weeks and months
from now someone may reference the series and want to peruse the history
of the previous review comments.
I will.
Post by John Ferlan
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
- patches 1 and 2 are already merged
- don't bother with reporting monitor EOF reason to user as most of
time it is simply "unexpected eof" (this implies dropping patch 3)
- drop patch 5 as we now always report "domain is being stopped"
in qemuDomainObjWait
- don't signal on monitor error for simplicity (otherwise we need to report
something more elaborate that "domain is being stopped" as we don't
kill domain on monitor errors. On the other hand I guess monitor
error is rare case to handle it right now)
- keep virDomainObjWait for async jobs
It's a bit uneven that for async jobs domain is destroyed concurrently and for
non async jobs it will be actually destroyed after job get completed. Also if
non async job needs issuing commands to qemu on cleanup then we will send these
commands in vain polluting logs etc because qemu process in not running at this
moment but typical check (virDomainObjIsActive) will think it is still running.
Domain is destroyed (qemuProcessStop) in a job due to patches [1] and [2].
However AFAIU it is not neccessary. If qemuProcessStop does not drop VM lock
then we don't need extra job to make qemuProcessStop and main job not
interleave. And we can drop the lock now only in qemuDomainObjBeginNestedJob in
qemuProcessStop which is introduced in [2]. AFAIU we can fix issues mentioned in
[2] the other way for example like it is done for qemu agent - we save agent
monitor reference on stack for entering/exiting agent monitor.
So I wonder can we instead of this fix remove job for qemuProcessStop and run
destroying domain cuncurrently for non async jobs too.
[1]
commit 8c9ff9960b29d4703a99efdd1cadcf6f48799cc0
Date: Thu Feb 11 15:32:48 2016 +0100
qemu: Process monitor EOF in a job
[2]
commit 81f50cb92d16643bcd749e3ab5b404b8b7cec643
Date: Thu Feb 11 11:20:28 2016 +0100
qemu: Avoid calling qemuProcessStop without a job
src/qemu/qemu_domain.c | 39 +++++++++++++++++++++++++++++++++++++++
src/qemu/qemu_domain.h | 4 ++++
src/qemu/qemu_driver.c | 2 +-
src/qemu/qemu_hotplug.c | 4 ++--
src/qemu/qemu_process.c | 9 +++++----
5 files changed, 51 insertions(+), 7 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 939b2a3..aead72b 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -13534,3 +13534,42 @@ qemuDomainRunningReasonToResumeEvent(virDomainRunningReason reason)
return VIR_DOMAIN_EVENT_RESUMED_UNPAUSED;
}
+
+
+/**
+ * Waits for domain condition to be triggered for a specific period of time.
*indefinitely
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
+ *
+ * -1 on error
+ * 0 on success
+ * 1 on timeout
+ */
+int
+qemuDomainObjWait(virDomainObjPtr vm, unsigned long long until)
Each argument on it's own line.
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
+{
+ qemuDomainObjPrivatePtr priv = vm->privateData;
+ int rc;
+
+ if (until)
+ rc = virCondWaitUntil(&vm->cond, &vm->parent.lock, until);
+ else
+ rc = virCondWait(&vm->cond, &vm->parent.lock);
+
+ if (rc < 0) {
+ if (until && errno == ETIMEDOUT)
+ return 1;
+
+ virReportSystemError(errno, "%s",
+ _("failed to wait for domain condition"));
+ return -1;
+ }
+
+ if (priv->monEOF) {
+ virReportError(VIR_ERR_OPERATION_FAILED, "%s",
+ _("domain is being stopped"));
"monitor has been closed"
Well I thought user is unaware of concept of monitor. We do have commands
like qemu-monitor-command but they are targeted for devs/devops I guess.
Nevertheless original message a bit confusing too (being stopped???) :)
Post by John Ferlan
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
+ return -1;
+ }
No chance that vm->def->id could be set to -1 during any of this, right?
Perhaps it doesn't hurt to check virDomainObjIsActive too. Paranoia,
you know...
I'm afraid this can become dead code that we will resist to touch
as we don't understand why it exists) I see no possibilities why
we miss monEOF but hit virDomainObjIsActive now. Anyway before
this patch we would get deadlock anyway so it won't get any worse
if we don't add the check you suggest. If we really find situations
when it will be useful then we will do it immediately.
Post by John Ferlan
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
+
+ return 0;
+}
diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h
index 2f8a1bf..36ab294 100644
--- a/src/qemu/qemu_domain.h
+++ b/src/qemu/qemu_domain.h
@@ -281,6 +281,7 @@ struct _qemuDomainObjPrivate {
virDomainChrSourceDefPtr monConfig;
bool monJSON;
bool monError;
+ bool monEOF;
unsigned long long monStart;
qemuAgentPtr agent;
@@ -1085,4 +1086,7 @@ void qemuDomainStorageIdReset(qemuDomainObjPrivatePtr priv);
virDomainEventResumedDetailType
qemuDomainRunningReasonToResumeEvent(virDomainRunningReason reason);
+int
+qemuDomainObjWait(virDomainObjPtr vm, unsigned long long until);
+
#endif /* __QEMU_DOMAIN_H__ */
All of the above can be it's own patch to "qemu: Introduce
qemuDomainObjWait" with a commit message to describe why this would be
used instead of the virDomainObjWait especially considering the QEMU
events and the inability to get the exit criteria set by
virDomainObjIsActive of vm->def->id != -1.
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
Post by Nikolay Shirokovskiy
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index b238309..f4250da 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -17142,7 +17142,7 @@ qemuDomainBlockJobAbort(virDomainPtr dom,
qemuDomainDiskPrivatePtr diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk);
qemuBlockJobUpdate(vm, QEMU_ASYNC_JOB_NONE, disk, NULL);
while (diskPriv->blockjob) {
- if (virDomainObjWait(vm) < 0) {
+ if (qemuDomainObjWait(vm, 0) < 0) {
ret = -1;
goto endjob;
}
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index 4558a3c..8189629 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -165,7 +165,7 @@ qemuHotplugWaitForTrayEject(virDomainObjPtr vm,
return -1;
while (disk->tray_status != VIR_DOMAIN_DISK_TRAY_OPEN) {
- if ((rc = virDomainObjWaitUntil(vm, now + CHANGE_MEDIA_TIMEOUT)) < 0)
+ if ((rc = qemuDomainObjWait(vm, now + CHANGE_MEDIA_TIMEOUT)) < 0)
return -1;
if (rc > 0) {
@@ -5002,7 +5002,7 @@ qemuDomainWaitForDeviceRemoval(virDomainObjPtr vm)
until += qemuDomainRemoveDeviceWaitTime;
while (priv->unplug.alias) {
- if ((rc = virDomainObjWaitUntil(vm, until)) == 1)
+ if ((rc = qemuDomainObjWait(vm, until)) == 1)
return 0;
if (rc < 0) {
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 29b0ba1..dd03269 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -290,9 +290,12 @@ qemuProcessHandleMonitorEOF(qemuMonitorPtr mon,
virObjectLock(vm);
+ priv = vm->privateData;
We could just initialize this at the top since we don't even check !vm
first anyway.
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
+ priv->monEOF = true;
+ virDomainObjBroadcast(vm);
So why Broadcast here? Why not just let processMonitorEOFEvent and the
call to qemuProcessBeginStopJob handle this?
Because qemuProcessBeginStopJob can not begin job, it is occupied by waiter.
Post by John Ferlan
Also, let's place this after the following DEBUG message - nice to keep
those up as high as possible.
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
+
VIR_DEBUG("Received EOF on %p '%s'", vm, vm->def->name);
- priv = vm->privateData;
if (priv->beingDestroyed) {
VIR_DEBUG("Domain is being destroyed, EOF is expected");
goto cleanup;
@@ -5996,6 +5999,7 @@ qemuProcessPrepareDomain(virQEMUDriverPtr driver,
priv->monJSON = true;
priv->monError = false;
+ priv->monEOF = false;
priv->monStart = 0;
priv->gotShutdown = false;
priv->runningReason = VIR_DOMAIN_RUNNING_UNKNOWN;
@@ -6965,9 +6969,6 @@ qemuProcessBeginStopJob(virQEMUDriverPtr driver,
if (qemuProcessKill(vm, killFlags) < 0)
goto cleanup;
- /* Wake up anything waiting on domain condition */
- virDomainObjBroadcast(vm);
-
Since this is called by more than just processMonitorEOFEvent, I would
think removing it could cause issues for qemuProcessAutoDestroy and
qemuDomainDestroyFlags. What would cause them to have their possibly
blocked blockjob or external memory dump to be notified of this event?
Removing won't hurt. Autodestroy and destroy both first kill qemu and
then try to acquire job. If async job is active then they just aquire
destroy job immediately because async job let destroy run cuncurrently.
If non async job is active then we get EOF first which after this patch let
not async job finish and then destroy/autodestroy can proceed with stopping
the domain. The removed broadcast just makes spurious wakeup of waiter
I think.
I would also like to discuss the alternative approach described just before
patch diff stats. It looks more appopriate to me but we need a comment
from Jiri I guess to follow it.
Nikolay
Post by John Ferlan
Post by Nikolay Shirokovskiy
Post by John Ferlan
Post by Nikolay Shirokovskiy
if (qemuDomainObjBeginJob(driver, vm, job) < 0)
goto cleanup;
--
libvir-list mailing list
https://www.redhat.com/mailman/listinfo/libvir-list
Loading...