Discussion:
[libvirt] [Qemu-devel] live snapshot wiki updated
(too old to reply)
Eric Blake
2011-07-19 14:24:48 UTC
Permalink
[adding the libvir-list]
Eric, what happens if libvirt in an selinux environment tells QEMU to
launch using an image file that is backed by backing file(s)?
Before starting qemu, libvirt first parses all the image files, to see
if any of them have backing images. For every qcow2 or qed image with a
backing file, libvirt sets the SELinux context of both the qcow2 image
and its backing file so that qemu will be able to successfully open()
them. But if any of those files reside on NFS, then it is not possible
to label individual files, so it requires setting the SELinux bool
virt_use_nfs, which thus gives qemu the power to open() arbitrary files
on NFS, and you've lost security.
Urgh, libvirt parsing image files is really unfortunate, it really
doesn't give me warm fuzzy feelings :( libvirt really should not know
about internals of image formats.
But even if you add new features to qemu to avoid needing this in the
future, it doesn't change the past - libvirt will always have to know
how to parse image files understood by older qemu, and so as long as
libvirt already knows how to do that parsing, we might as well take
advantage of it.

Besides, I feel that having a well-documented file format, so that
independent applications can both parse the same file with the same
semantics by obeying the file format specification, is a good design goal.
It would be nice if libvirt had a way to pass fds for every disk and
backing file up front; then, SELinux can work around the lack of NFS
per-file labelling by blocking open() in qemu. In fact, this has
A cleaner solution seems to have libvirt provide a call-back allowing
QEMU to call out and have libvirt open a file descriptor instead. This
way libvirt can validate it and open it for QEMU and pass it back.
Yes, that could probably be made to work with libvirt.
If we cannot do something like this, I would prefer to have backing
files on NFS should simply not be supported when running in an selinux
setup.
As nice as that sentiment is, it will never fly, because it would be a
regression in current behavior. The whole reason that the virt_use_nfs
SELinux bool exists is that some people are willing to make the partial
security tradeoff. Besides, the use of sVirt via SELinux is more than
just open() protection - while the current virt_use_nfs bool makes NFS
less secure than otherwise possible, it still gives some nice guarantees
to the rest of the qemu process such as passthrough accesses to local
pci devices.

Just because it is currently not as secure to mix NFS shared storage
with backing files doesn't stop some people from wanting to do it [in
fact, that's my current development setup - I use qcow2 images on NFS
shared storage, keep SELinux enabled, and enable the virt_use_nfs bool].
This discussion is about adding enhancements that make SELinux even
more powerful when using NFS shared storage, by adding fd passing
(whether libvirt parses in advance, or whether qemu raises an event and
requires feedback from libvirt), and not about crippling the existing
capability to use the virt_use_nfs selinux bool.
--
Eric Blake ***@redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
Jes Sorensen
2011-07-19 14:30:19 UTC
Permalink
Post by Eric Blake
[adding the libvir-list]
Urgh, libvirt parsing image files is really unfortunate, it really
doesn't give me warm fuzzy feelings :( libvirt really should not know
about internals of image formats.
But even if you add new features to qemu to avoid needing this in the
future, it doesn't change the past - libvirt will always have to know
how to parse image files understood by older qemu, and so as long as
libvirt already knows how to do that parsing, we might as well take
advantage of it.
What has been done here in the past is plain wrong. Continuing to do it
isn't the right thing to do here.
Post by Eric Blake
Besides, I feel that having a well-documented file format, so that
independent applications can both parse the same file with the same
semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not
get added and libvirt will never be able to keep up. The driver for a
file format belongs in QEMU and nowhere else.
Post by Eric Blake
It would be nice if libvirt had a way to pass fds for every disk and
backing file up front; then, SELinux can work around the lack of NFS
per-file labelling by blocking open() in qemu. In fact, this has
A cleaner solution seems to have libvirt provide a call-back allowing
QEMU to call out and have libvirt open a file descriptor instead. This
way libvirt can validate it and open it for QEMU and pass it back.
Yes, that could probably be made to work with libvirt.
I am a little frustrated this approach wasn't taken up front instead of
the evil hack of having libvirt attempt to parse image files.
Post by Eric Blake
If we cannot do something like this, I would prefer to have backing
files on NFS should simply not be supported when running in an selinux
setup.
As nice as that sentiment is, it will never fly, because it would be a
regression in current behavior. The whole reason that the virt_use_nfs
SELinux bool exists is that some people are willing to make the partial
security tradeoff. Besides, the use of sVirt via SELinux is more than
just open() protection - while the current virt_use_nfs bool makes NFS
less secure than otherwise possible, it still gives some nice guarantees
to the rest of the qemu process such as passthrough accesses to local
pci devices.
Well leaving things at status quo is not making it worse, it just leaves
an evil in place.
Post by Eric Blake
Just because it is currently not as secure to mix NFS shared storage
with backing files doesn't stop some people from wanting to do it [in
fact, that's my current development setup - I use qcow2 images on NFS
shared storage, keep SELinux enabled, and enable the virt_use_nfs bool].
This discussion is about adding enhancements that make SELinux even
more powerful when using NFS shared storage, by adding fd passing
(whether libvirt parses in advance, or whether qemu raises an event and
requires feedback from libvirt), and not about crippling the existing
capability to use the virt_use_nfs selinux bool.
I do not believe we should try and add extra interfaces to support
something which is inherently broken. This really boils down to whether
we should support fd passing for snapshots in the first place. If it is
to support the broken setup of libvirt parsing image files, then I am
totally against it, if we work on a proper solution that involves this
in some way, then we can discuss it.

Cheers,
Jes
Stefan Hajnoczi
2011-07-19 15:14:27 UTC
Permalink
Post by Jes Sorensen
Post by Eric Blake
[adding the libvir-list]
Urgh, libvirt parsing image files is really unfortunate, it really
doesn't give me warm fuzzy feelings :( libvirt really should not know
about internals of image formats.
But even if you add new features to qemu to avoid needing this in the
future, it doesn't change the past - libvirt will always have to know
how to parse image files understood by older qemu, and so as long as
libvirt already knows how to do that parsing, we might as well take
advantage of it.
What has been done here in the past is plain wrong. Continuing to do it
isn't the right thing to do here.
Post by Eric Blake
Besides, I feel that having a well-documented file format, so that
independent applications can both parse the same file with the same
semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not
get added and libvirt will never be able to keep up. The driver for a
file format belongs in QEMU and nowhere else.
It should be a goal to avoid dependencies in multiple layers of the
stack because it becomes are to add new features - they require
coordinated changes in multiple layers. Having both QEMU and libvirt
know the internals of image files is such a multi-dependency. If I
want to add a new format or change an existing format I have to touch
both layers.

For fd-passing perhaps we have an opportunity to use a callback
mechanism (QEMU request: filename -> libvirt response: fd) and do all
the image format parsing in QEMU.

Stefan
Daniel P. Berrange
2011-07-19 16:46:13 UTC
Permalink
Post by Stefan Hajnoczi
Post by Jes Sorensen
Post by Eric Blake
[adding the libvir-list]
Urgh, libvirt parsing image files is really unfortunate, it really
doesn't give me warm fuzzy feelings :( libvirt really should not know
about internals of image formats.
But even if you add new features to qemu to avoid needing this in the
future, it doesn't change the past - libvirt will always have to know
how to parse image files understood by older qemu, and so as long as
libvirt already knows how to do that parsing, we might as well take
advantage of it.
What has been done here in the past is plain wrong. Continuing to do it
isn't the right thing to do here.
Post by Eric Blake
Besides, I feel that having a well-documented file format, so that
independent applications can both parse the same file with the same
semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not
get added and libvirt will never be able to keep up. The driver for a
file format belongs in QEMU and nowhere else.
It should be a goal to avoid dependencies in multiple layers of the
stack because it becomes are to add new features - they require
coordinated changes in multiple layers. Having both QEMU and libvirt
know the internals of image files is such a multi-dependency. If I
want to add a new format or change an existing format I have to touch
both layers.
For fd-passing perhaps we have an opportunity to use a callback
mechanism (QEMU request: filename -> libvirt response: fd) and do all
the image format parsing in QEMU.
The reason why libvirt does the parsing of file headers to determine
backing files is to maintain the trust boundary. Everything run from
the exec() of QEMU onwards is considered untrusted code. So having
QEMU parsing the file headers & passing back open() requests to libvirt
is breaking the trust boundary.

NB, i'm not happy about libvirt having to have knowledge of file format
headers, but we needed something more efficient & reliable than invoking
qemu-img info & parsing the output. Ideally QEMU (or something else)
would provide a library libblockformat.so with stable APIs for at least
reading metadata about image formats. If it had APIs for image creation,
etc too that would be a bonus, but we're more or less ok spawning qemu-img
for those cases currently.

Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Markus Armbruster
2011-07-20 07:30:21 UTC
Permalink
Post by Daniel P. Berrange
Post by Stefan Hajnoczi
Post by Jes Sorensen
Post by Eric Blake
[adding the libvir-list]
Urgh, libvirt parsing image files is really unfortunate, it really
doesn't give me warm fuzzy feelings :( libvirt really should not know
about internals of image formats.
But even if you add new features to qemu to avoid needing this in the
future, it doesn't change the past - libvirt will always have to know
how to parse image files understood by older qemu, and so as long as
libvirt already knows how to do that parsing, we might as well take
advantage of it.
What has been done here in the past is plain wrong. Continuing to do it
isn't the right thing to do here.
Post by Eric Blake
Besides, I feel that having a well-documented file format, so that
independent applications can both parse the same file with the same
semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not
get added and libvirt will never be able to keep up. The driver for a
file format belongs in QEMU and nowhere else.
It should be a goal to avoid dependencies in multiple layers of the
stack because it becomes are to add new features - they require
coordinated changes in multiple layers. Having both QEMU and libvirt
know the internals of image files is such a multi-dependency. If I
want to add a new format or change an existing format I have to touch
both layers.
For fd-passing perhaps we have an opportunity to use a callback
mechanism (QEMU request: filename -> libvirt response: fd) and do all
the image format parsing in QEMU.
The reason why libvirt does the parsing of file headers to determine
backing files is to maintain the trust boundary. Everything run from
the exec() of QEMU onwards is considered untrusted code. So having
QEMU parsing the file headers & passing back open() requests to libvirt
is breaking the trust boundary.
Exactly.

The block drivers form a tree. Each driver opens its children. For
convenience, some of them have information on how to open their children
encoded in their image (e.g. COW backing images). Others receive it as
configuration, encoded in their "filename" string (e.g. blkdebug).

Thus, we have direct control only over the root block driver. We can
pass it fds easily.

We control the non-root block drivers only indirectly, through the block
drivers along the path from the root. We don't have a way to pass them
fds.

If we had a way to construct the tree bottom-up, we could directly
control all the block drivers.

Requires knowing the number of children, and extracting child
configuration from images where applicable.

[...]
Jes Sorensen
2011-07-20 08:23:12 UTC
Permalink
Post by Daniel P. Berrange
Post by Stefan Hajnoczi
For fd-passing perhaps we have an opportunity to use a callback
mechanism (QEMU request: filename -> libvirt response: fd) and do all
the image format parsing in QEMU.
The reason why libvirt does the parsing of file headers to determine
backing files is to maintain the trust boundary. Everything run from
the exec() of QEMU onwards is considered untrusted code. So having
QEMU parsing the file headers & passing back open() requests to libvirt
is breaking the trust boundary.
Pardon, but I fail to see the issue here. If QEMU passes a filename back
to libvirt, libvirt still gets to make the decision whether or not it is
legitimate for QEMU to get that file descriptor or not. It doesn't
change anything wrt who actually opens the file, hence the 'trust' is
unchanged.
Post by Daniel P. Berrange
NB, i'm not happy about libvirt having to have knowledge of file format
headers, but we needed something more efficient & reliable than invoking
qemu-img info & parsing the output. Ideally QEMU (or something else)
would provide a library libblockformat.so with stable APIs for at least
reading metadata about image formats. If it had APIs for image creation,
etc too that would be a bonus, but we're more or less ok spawning qemu-img
for those cases currently.
Even having a library for libvirt to link against is suboptimal here.
Two processes shouldn't be fighting over the internals of metadata, the
ownership of the metadata belongs solely with QEMU. In addition you have
the constant issue of dependencies there, hence if QEMU is updated and
it provides a newer block format library, it may prevent libvirt from
running forcing an update of libvirt as well. That is not acceptable for
development.

Cheers,
Jes
Daniel P. Berrange
2011-07-20 09:36:09 UTC
Permalink
Post by Jes Sorensen
Post by Daniel P. Berrange
Post by Stefan Hajnoczi
For fd-passing perhaps we have an opportunity to use a callback
mechanism (QEMU request: filename -> libvirt response: fd) and do all
the image format parsing in QEMU.
The reason why libvirt does the parsing of file headers to determine
backing files is to maintain the trust boundary. Everything run from
the exec() of QEMU onwards is considered untrusted code. So having
QEMU parsing the file headers & passing back open() requests to libvirt
is breaking the trust boundary.
Pardon, but I fail to see the issue here. If QEMU passes a filename back
to libvirt, libvirt still gets to make the decision whether or not it is
legitimate for QEMU to get that file descriptor or not. It doesn't
change anything wrt who actually opens the file, hence the 'trust' is
unchanged.
To make the decision whether the filename from QEMU is valid, we have
to parse the master image header data to see if the filename actually
matches the backing file required by the image assigned to the guest.
Post by Jes Sorensen
Post by Daniel P. Berrange
NB, i'm not happy about libvirt having to have knowledge of file format
headers, but we needed something more efficient & reliable than invoking
qemu-img info & parsing the output. Ideally QEMU (or something else)
would provide a library libblockformat.so with stable APIs for at least
reading metadata about image formats. If it had APIs for image creation,
etc too that would be a bonus, but we're more or less ok spawning qemu-img
for those cases currently.
Even having a library for libvirt to link against is suboptimal here.
Two processes shouldn't be fighting over the internals of metadata, the
ownership of the metadata belongs solely with QEMU. In addition you have
the constant issue of dependencies there, hence if QEMU is updated and
it provides a newer block format library, it may prevent libvirt from
running forcing an update of libvirt as well. That is not acceptable for
development.
We're not fighting over the internals of metadata. We just need to know
ahead of launching QEMU, what backing files an image has & what format
they are in. We do that by reading at the metadata headers of the disk
images. We never attempt to write to the disk images. Either someone
provides a library todo that, or we write the probing code for each
file format in libvirt. Currently we have the latter.

Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Nicolas Sebrecht
2011-07-20 10:15:02 UTC
Permalink
Post by Daniel P. Berrange
To make the decision whether the filename from QEMU is valid, we have
to parse the master image header data to see if the filename actually
matches the backing file required by the image assigned to the guest.
Actually, libvirt should not have to worry if the filename provided by
QEMU is valid. I think it should trust QEMU. If QEMU doesn't provide
information others can trust; it should be fixed at QEMU side.
Post by Daniel P. Berrange
We're not fighting over the internals of metadata. We just need to know
ahead of launching QEMU, what backing files an image has & what format
they are in. We do that by reading at the metadata headers of the disk
images. We never attempt to write to the disk images. Either someone
provides a library todo that, or we write the probing code for each
file format in libvirt. Currently we have the latter.
This is what I would call "fighting with QEMU internals". How do you
prevent from concurrency access and modifications? Ideally speacking
libvirt should be able to co-exist with foreign implementations, all
requesting QEMU.
--
Nicolas Sebrecht
Daniel P. Berrange
2011-07-20 10:28:46 UTC
Permalink
Post by Nicolas Sebrecht
Post by Daniel P. Berrange
To make the decision whether the filename from QEMU is valid, we have
to parse the master image header data to see if the filename actually
matches the backing file required by the image assigned to the guest.
Actually, libvirt should not have to worry if the filename provided by
QEMU is valid. I think it should trust QEMU. If QEMU doesn't provide
information others can trust; it should be fixed at QEMU side.
The security goal of libvirt is to protect the host from a compromised
QEMU, therefore QEMU is, by definition, untrusted.
Post by Nicolas Sebrecht
Post by Daniel P. Berrange
We're not fighting over the internals of metadata. We just need to know
ahead of launching QEMU, what backing files an image has & what format
they are in. We do that by reading at the metadata headers of the disk
images. We never attempt to write to the disk images. Either someone
provides a library todo that, or we write the probing code for each
file format in libvirt. Currently we have the latter.
This is what I would call "fighting with QEMU internals". How do you
prevent from concurrency access and modifications? Ideally speacking
libvirt should be able to co-exist with foreign implementations, all
requesting QEMU.
QEMU is *not* yet running at the time libvirt reads the file metadata.

Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Stefan Hajnoczi
2011-07-20 11:40:51 UTC
Permalink
On Wed, Jul 20, 2011 at 11:28 AM, Daniel P. Berrange
Post by Daniel P. Berrange
Post by Nicolas Sebrecht
Post by Daniel P. Berrange
To make the decision whether the filename from QEMU is valid, we have
to parse the master image header data to see if the filename actually
matches the backing file required by the image assigned to the guest.
Actually, libvirt should not have to worry if the filename provided by
QEMU is valid. I think it should trust QEMU. If QEMU doesn't provide
information others can trust; it should be fixed at QEMU side.
The security goal of libvirt is to protect the host from a compromised
QEMU, therefore QEMU is, by definition, untrusted.
This is a very reasonable goal. QEMU is constantly dealing with the
untrusted guest. The whole point of SELinux isolation of QEMU is to
contain any compromise to a single VM and reduce the capabilities of
that process to the minimum.

libvirt needs to help set the boundaries of what the QEMU process can do.

Stefan
Jes Sorensen
2011-07-21 08:40:48 UTC
Permalink
Post by Daniel P. Berrange
Post by Nicolas Sebrecht
Actually, libvirt should not have to worry if the filename provided by
QEMU is valid. I think it should trust QEMU. If QEMU doesn't provide
information others can trust; it should be fixed at QEMU side.
The security goal of libvirt is to protect the host from a compromised
QEMU, therefore QEMU is, by definition, untrusted.
Well that part goes both ways. By applying this model you are going to
make it a hard requirement for libvirt to be upgraded with QEMU even for
smaller updates.
Post by Daniel P. Berrange
Post by Nicolas Sebrecht
Post by Daniel P. Berrange
We're not fighting over the internals of metadata. We just need to know
ahead of launching QEMU, what backing files an image has & what format
they are in. We do that by reading at the metadata headers of the disk
images. We never attempt to write to the disk images. Either someone
provides a library todo that, or we write the probing code for each
file format in libvirt. Currently we have the latter.
This is what I would call "fighting with QEMU internals". How do you
prevent from concurrency access and modifications? Ideally speacking
libvirt should be able to co-exist with foreign implementations, all
requesting QEMU.
QEMU is *not* yet running at the time libvirt reads the file metadata.
Of course it is: hotplug

Jes
Daniel P. Berrange
2011-07-21 09:34:01 UTC
Permalink
Post by Jes Sorensen
Post by Daniel P. Berrange
Post by Nicolas Sebrecht
Actually, libvirt should not have to worry if the filename provided by
QEMU is valid. I think it should trust QEMU. If QEMU doesn't provide
information others can trust; it should be fixed at QEMU side.
The security goal of libvirt is to protect the host from a compromised
QEMU, therefore QEMU is, by definition, untrusted.
Well that part goes both ways. By applying this model you are going to
make it a hard requirement for libvirt to be upgraded with QEMU even for
smaller updates.
Post by Daniel P. Berrange
Post by Nicolas Sebrecht
Post by Daniel P. Berrange
We're not fighting over the internals of metadata. We just need to know
ahead of launching QEMU, what backing files an image has & what format
they are in. We do that by reading at the metadata headers of the disk
images. We never attempt to write to the disk images. Either someone
provides a library todo that, or we write the probing code for each
file format in libvirt. Currently we have the latter.
This is what I would call "fighting with QEMU internals". How do you
prevent from concurrency access and modifications? Ideally speacking
libvirt should be able to co-exist with foreign implementations, all
requesting QEMU.
QEMU is *not* yet running at the time libvirt reads the file metadata.
Of course it is: hotplug
In the case of hotplug, the hotplug monitor commands have not yet been
invoked when libvirt access the file metadata, or the drive has already
been unplugged, so there is still no concurrent access to the file by
QEMU and libvirt.

Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Jes Sorensen
2011-07-21 09:34:11 UTC
Permalink
Post by Daniel P. Berrange
Post by Jes Sorensen
Post by Daniel P. Berrange
QEMU is *not* yet running at the time libvirt reads the file metadata.
Of course it is: hotplug
In the case of hotplug, the hotplug monitor commands have not yet been
invoked when libvirt access the file metadata, or the drive has already
been unplugged, so there is still no concurrent access to the file by
QEMU and libvirt.
Unless someone tries to hotplug an image that relies on a backing file
already used by another image. While unlikely, it is possible.

Jes
Eric Blake
2011-07-21 13:52:41 UTC
Permalink
Post by Jes Sorensen
Post by Daniel P. Berrange
Post by Jes Sorensen
Post by Daniel P. Berrange
QEMU is *not* yet running at the time libvirt reads the file metadata.
Of course it is: hotplug
In the case of hotplug, the hotplug monitor commands have not yet been
invoked when libvirt access the file metadata, or the drive has already
been unplugged, so there is still no concurrent access to the file by
QEMU and libvirt.
Unless someone tries to hotplug an image that relies on a backing file
already used by another image. While unlikely, it is possible.
Backing images should be treated as read-only by qemu, right? That is,
if I have file c.img which uses ca.img as its backing file, then qemu
only needs O_RDONLY access to ca.img, right? My understanding is that
you never want to have a qcow2 image whose backing file is being
simultaneously edited by another external process, otherwise you risk
data corruption from the point of view of the qemu process that is
trying to refer to that backing file.

Given that restriction, if I want to hotplug file d.img that _also_ uses
ca.img as its backing file, then that's not an issue. Libvirt _still_
reads the metadata of d.img, and learns of the backing file of ca.img,
all before calling the hotplug command, even if ca.img is already in use
by qemu, with no complications.

You still haven't managed to convince me that there is ever any
situation where qemu needs to open a backing file that is not already
determined /a priori/ by reading just the metadata of the files being
handed to qemu in the first place, or by being aware of the metadata
relationship being requested of qemu in the first place (such as the
relationship implied by calling snapshot_blkdev to create a qcow2 file
with an existing file as backing).
--
Eric Blake ***@redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
Eric Blake
2011-07-21 13:47:18 UTC
Permalink
Post by Jes Sorensen
Post by Daniel P. Berrange
The security goal of libvirt is to protect the host from a compromised
QEMU, therefore QEMU is, by definition, untrusted.
Well that part goes both ways. By applying this model you are going to
make it a hard requirement for libvirt to be upgraded with QEMU even for
smaller updates.
Right now, libvirt only needs the name of the backing file and type.
How often has the qcow2 file format changed in such a way that that
particular information has been incompatibly moved around, so that
clients have to learn a new file format in order to parse the
information in its new location? The set of information that libvirt
needs out of a qcow2 image is relatively small and stable, compared to
any other changes being made in the rest of the file format, and the
libvirt folks are more than welcome to help review any qcow2 format
changes for stability impacts.

Furthermore, you are forgetting that libvirt already ends up upgrading
to pick up new qemu features all the time, not just for qcow2 layout
changes. If you are okay with the feature set supported by an older
qemu, then you are also okay using an older libvirt that targets just
that feature set - you only have to upgrade libvirt when talking to a
newer qemu that requires the use of a newer feature set. Older libvirt
can generally talk to newer qemu if qemu made backwards-compatible changes.
--
Eric Blake ***@redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
Jes Sorensen
2011-07-21 14:19:00 UTC
Permalink
Post by Jes Sorensen
Post by Daniel P. Berrange
The security goal of libvirt is to protect the host from a compromised
QEMU, therefore QEMU is, by definition, untrusted.
Well that part goes both ways. By applying this model you are going to
make it a hard requirement for libvirt to be upgraded with QEMU even for
smaller updates.
Right now, libvirt only needs the name of the backing file and type. How
often has the qcow2 file format changed in such a way that that
particular information has been incompatibly moved around, so that
clients have to learn a new file format in order to parse the
information in its new location? The set of information that libvirt
needs out of a qcow2 image is relatively small and stable, compared to
any other changes being made in the rest of the file format, and the
libvirt folks are more than welcome to help review any qcow2 format
changes for stability impacts.
QED, QCOW3

either way, libvirt should be able to boot a guest with an upgraded
QEMU, even if it doesn't support all the features. In this case you are
going to provide a hard requirement on libvirt being upgraded in sync.
Furthermore, you are forgetting that libvirt already ends up upgrading
to pick up new qemu features all the time, not just for qcow2 layout
changes. If you are okay with the feature set supported by an older
qemu, then you are also okay using an older libvirt that targets just
that feature set - you only have to upgrade libvirt when talking to a
newer qemu that requires the use of a newer feature set. Older libvirt
can generally talk to newer qemu if qemu made backwards-compatible changes.
There is a difference between upgrading to pick up additional features
and forced upgrade.

It is not just a question of libvirt possibly reviewing a file format,
it is also having two codebases that needs to be implemented and maintained.

Jes
Eric Blake
2011-07-21 14:51:48 UTC
Permalink
Post by Jes Sorensen
There is a difference between upgrading to pick up additional features
and forced upgrade.
It is not just a question of libvirt possibly reviewing a file format,
it is also having two codebases that needs to be implemented and maintained.
Then give us a libblockid.so that provides a stable interface, and which
is also used by qemu to manage QED, qcow3, ..., so that libvirt can rely
on the stable library interface rather than having to learn new file
formats all the time.

But yes, we already had to upgrade libvirt code to parse qed formats,
and anyone wanting to use qed via libvirt has to do a lockstep upgrade
of both qemu and libvirt, because no one has yet written a nice shared
library that can do it on our behalf through a stable API.
--
Eric Blake ***@redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
Jes Sorensen
2011-07-21 08:38:58 UTC
Permalink
Post by Daniel P. Berrange
Post by Jes Sorensen
Pardon, but I fail to see the issue here. If QEMU passes a filename back
to libvirt, libvirt still gets to make the decision whether or not it is
legitimate for QEMU to get that file descriptor or not. It doesn't
change anything wrt who actually opens the file, hence the 'trust' is
unchanged.
To make the decision whether the filename from QEMU is valid, we have
to parse the master image header data to see if the filename actually
matches the backing file required by the image assigned to the guest.
Sorry but that doesn't make any sense. In other words, if someone hacks
an image and makes it point to a different file, you are going to allow
the backing file to be opened just because it is listed in the image?

If this is really the approach you are suggesting, it seems to me the
whole 'do not allow random opens on NFS' security thing has gone out the
window.

To the best of my understanding, the whole idea with selinux attributes
was to be able to specify which files are allowed to be opened by a
given process. Mapping this to the libvirt model, it should mean libvirt
ought to carry a positive list of files that are allowed to be opened,
rather than relying on what might be written to an image file.

Jes
Eric Blake
2011-07-21 14:02:37 UTC
Permalink
Post by Jes Sorensen
Post by Daniel P. Berrange
Post by Jes Sorensen
Pardon, but I fail to see the issue here. If QEMU passes a filename back
to libvirt, libvirt still gets to make the decision whether or not it is
legitimate for QEMU to get that file descriptor or not. It doesn't
change anything wrt who actually opens the file, hence the 'trust' is
unchanged.
To make the decision whether the filename from QEMU is valid, we have
to parse the master image header data to see if the filename actually
matches the backing file required by the image assigned to the guest.
Sorry but that doesn't make any sense. In other words, if someone hacks
an image and makes it point to a different file, you are going to allow
the backing file to be opened just because it is listed in the image?
Yes, because the only way someone could hack that image is if they have
rights to that file in the first place. And if they have rights to that
file in the first place, then they also can call qemu-img to modify that
file, or any other number of modifications - but that's not an
escalation in privilege, so it is not a security hole.
Jes Sorensen
2011-07-21 14:08:25 UTC
Permalink
Post by Eric Blake
Post by Jes Sorensen
Post by Daniel P. Berrange
Post by Jes Sorensen
Pardon, but I fail to see the issue here. If QEMU passes a filename back
to libvirt, libvirt still gets to make the decision whether or not it is
legitimate for QEMU to get that file descriptor or not. It doesn't
change anything wrt who actually opens the file, hence the 'trust' is
unchanged.
To make the decision whether the filename from QEMU is valid, we have
to parse the master image header data to see if the filename actually
matches the backing file required by the image assigned to the guest.
Sorry but that doesn't make any sense. In other words, if someone hacks
an image and makes it point to a different file, you are going to allow
the backing file to be opened just because it is listed in the image?
Yes, because the only way someone could hack that image is if they have
rights to that file in the first place. And if they have rights to that
file in the first place, then they also can call qemu-img to modify that
file, or any other number of modifications - but that's not an
escalation in privilege, so it is not a security hole.
Ehm, we're talking about the case where a guest is able to compromise
the QEMU process controlling it. This is what libvirt / selinux is
trying to prevent. Now if the guest is able to break out of it's shell,
it can modify the disk image headers. Crash the QEMU process and wait
for the guest to be rebooted by the admin..... Voila we now have access
to random files on the NFS store we couldn't reach before.

So back to the drawing board, we do have a security problem here....
Post by Eric Blake
Post by Jes Sorensen
To the best of my understanding, the whole idea with selinux attributes
was to be able to specify which files are allowed to be opened by a
given process. Mapping this to the libvirt model, it should mean libvirt
ought to carry a positive list of files that are allowed to be opened,
Which it does, by reading the metadata of those files prior to the point
of ever handing those files over to an untrusted process - except in one
case: right now, libvirt is not validating that a qcow2 file is still in
a sane state after a qemu process ends.
You have previously explained that anything written to the QEMU header
is trusted by libvirt in the first place.
Post by Eric Blake
Post by Jes Sorensen
rather than relying on what might be written to an image file.
Thank you for persisting - you've found another hole that needs to be
plugged. It sounds like you are proposing that after a qemu process
dies, that libvirt re-reads the qcow2 metadata headers, and validates
that the backing file information has not changed in a manner unexpected
by libvirt. If it has, then the qemu process that just died was
compromised to the point that restarting a new qemu process from the old
image is now a security risk. So this is _yet another_ security aspect
that needs to be coded into libvirt as part of hardening sVirt.
But it is an independent issue of the need for fd passing.
No it is not independent, we're back to the point where we started. The
issue at hand is still that libvirt has no business messing about in
metadata headers of guest images. libvirt has the 'right' to validate
file names, but that is not justification in messing about in metadata.
Allowing that is an endless cat and mouse hunt of keeping up, which is
guaranteed to get out of sync.

If this problem is to be fixed, lets fix it correctly.

Jes
Stefan Hajnoczi
2011-07-21 15:01:52 UTC
Permalink
Post by Eric Blake
Thank you for persisting - you've found another hole that needs to be
plugged.  It sounds like you are proposing that after a qemu process dies,
that libvirt re-reads the qcow2 metadata headers, and validates that the
backing file information has not changed in a manner unexpected by libvirt.
 If it has, then the qemu process that just died was compromised to the
point that restarting a new qemu process from the old image is now a
security risk.  So this is _yet another_ security aspect that needs to be
coded into libvirt as part of hardening sVirt.
The backing file information changes when image streaming completes.

Before: fedora.img <- my_vm.qed
After: my_vm.qed (fedora.img is no longer referenced)

The image streaming operation copies data out of fedora.img and
populates my_vm.qed. When image streaming completes, the backing file
is no longer needed and my_vm.qed is updated to drop the backing file.

I think we need to design carefully to prevent QEMU and libvirt making
incorrect assumptions about who does what. I really wish that all
this image file business was outside QEMU and libvirt - that we had a
separate storage management service which handled the details. QEMU
would only do block device operations (no image format manipulation),
and libvirt would only delegate to the storage management service.
Today we seem to be sprinkling a little bit of storage management into
QEMU and a little bit into libvirt :(.

In that spirit it is much nicer to think of storage like a SAN
appliance where you have LUNs that you access as block devices. It
also provides an API for snapshotting, cloning LUNs, etc.

Let's move to that model instead of worrying about how to spread
storage logic across QEMU and libvirt.

Stefan
Blue Swirl
2011-07-21 19:42:45 UTC
Permalink
Post by Stefan Hajnoczi
Post by Eric Blake
Thank you for persisting - you've found another hole that needs to be
plugged.  It sounds like you are proposing that after a qemu process dies,
that libvirt re-reads the qcow2 metadata headers, and validates that the
backing file information has not changed in a manner unexpected by libvirt.
 If it has, then the qemu process that just died was compromised to the
point that restarting a new qemu process from the old image is now a
security risk.  So this is _yet another_ security aspect that needs to be
coded into libvirt as part of hardening sVirt.
The backing file information changes when image streaming completes.
Before: fedora.img <- my_vm.qed
After: my_vm.qed (fedora.img is no longer referenced)
The image streaming operation copies data out of fedora.img and
populates my_vm.qed.  When image streaming completes, the backing file
is no longer needed and my_vm.qed is updated to drop the backing file.
I think we need to design carefully to prevent QEMU and libvirt making
incorrect assumptions about who does what.  I really wish that all
this image file business was outside QEMU and libvirt - that we had a
separate storage management service which handled the details.  QEMU
would only do block device operations (no image format manipulation),
and libvirt would only delegate to the storage management service.
Today we seem to be sprinkling a little bit of storage management into
QEMU and a little bit into libvirt :(.
In that spirit it is much nicer to think of storage like a SAN
appliance where you have LUNs that you access as block devices.  It
also provides an API for snapshotting, cloning LUNs, etc.
Let's move to that model instead of worrying about how to spread
storage logic across QEMU and libvirt.
Would NBD protocol fit to this purpose, or is it too simple? Then
libvirt would handle the storage format completely and present an NBD
interface to QEMU (or give an fd to an external service) and QEMU
would not care about the storage format in this mode at all.
Stefan Hajnoczi
2011-07-22 05:06:54 UTC
Permalink
Post by Blue Swirl
Post by Stefan Hajnoczi
Post by Eric Blake
Thank you for persisting - you've found another hole that needs to be
plugged.  It sounds like you are proposing that after a qemu process dies,
that libvirt re-reads the qcow2 metadata headers, and validates that the
backing file information has not changed in a manner unexpected by libvirt.
 If it has, then the qemu process that just died was compromised to the
point that restarting a new qemu process from the old image is now a
security risk.  So this is _yet another_ security aspect that needs to be
coded into libvirt as part of hardening sVirt.
The backing file information changes when image streaming completes.
Before: fedora.img <- my_vm.qed
After: my_vm.qed (fedora.img is no longer referenced)
The image streaming operation copies data out of fedora.img and
populates my_vm.qed.  When image streaming completes, the backing file
is no longer needed and my_vm.qed is updated to drop the backing file.
I think we need to design carefully to prevent QEMU and libvirt making
incorrect assumptions about who does what.  I really wish that all
this image file business was outside QEMU and libvirt - that we had a
separate storage management service which handled the details.  QEMU
would only do block device operations (no image format manipulation),
and libvirt would only delegate to the storage management service.
Today we seem to be sprinkling a little bit of storage management into
QEMU and a little bit into libvirt :(.
In that spirit it is much nicer to think of storage like a SAN
appliance where you have LUNs that you access as block devices.  It
also provides an API for snapshotting, cloning LUNs, etc.
Let's move to that model instead of worrying about how to spread
storage logic across QEMU and libvirt.
Would NBD protocol fit to this purpose, or is it too simple? Then
libvirt would handle the storage format completely and present an NBD
interface to QEMU (or give an fd to an external service) and QEMU
would not care about the storage format in this mode at all.
NBD does not support flush (fdatasync). Therefore it only supports
the slow cache=writethrough mode in a safe manner.

It would be neat to use virtio-blk as the interface because it can be
passed through to the guest. The guest talks directly to the storage
management service without going through QEMU. The trick is to do
something like vhost:
1. An ioeventfd for virtqueue (guest->host) kicks
2. An irqfd for host->guest kicks
3. Shared memory for vring and zero-copy data access

The storage management service provides a UNIX domain socket over
which fds can be passed to set up the vhost-like virtio-blk interface.

Moving the image format code into a separate program makes it possible
to safely write to a backing file while VMs are using it because the
storage service can be host-wide, not per-VM. For example, streaming
a shared backing file over NFS while running VMs using copy-on-write
images. If we ever want to do deduplication or other global
operations, then this approach is nice too.

To summarize:
The storage service manages image files including creation, deletion,
snapshotting, and actual I/O. QEMU uses a vhost-like virtio-blk
interface and can pass it directly into the guest. libvirt uses the
storage service API without needing to parse image files or keep track
of backing file relationships.

Stefan
Blue Swirl
2011-07-22 15:49:46 UTC
Permalink
Post by Blue Swirl
Post by Stefan Hajnoczi
Post by Eric Blake
Thank you for persisting - you've found another hole that needs to be
plugged.  It sounds like you are proposing that after a qemu process dies,
that libvirt re-reads the qcow2 metadata headers, and validates that the
backing file information has not changed in a manner unexpected by libvirt.
 If it has, then the qemu process that just died was compromised to the
point that restarting a new qemu process from the old image is now a
security risk.  So this is _yet another_ security aspect that needs to be
coded into libvirt as part of hardening sVirt.
The backing file information changes when image streaming completes.
Before: fedora.img <- my_vm.qed
After: my_vm.qed (fedora.img is no longer referenced)
The image streaming operation copies data out of fedora.img and
populates my_vm.qed.  When image streaming completes, the backing file
is no longer needed and my_vm.qed is updated to drop the backing file.
I think we need to design carefully to prevent QEMU and libvirt making
incorrect assumptions about who does what.  I really wish that all
this image file business was outside QEMU and libvirt - that we had a
separate storage management service which handled the details.  QEMU
would only do block device operations (no image format manipulation),
and libvirt would only delegate to the storage management service.
Today we seem to be sprinkling a little bit of storage management into
QEMU and a little bit into libvirt :(.
In that spirit it is much nicer to think of storage like a SAN
appliance where you have LUNs that you access as block devices.  It
also provides an API for snapshotting, cloning LUNs, etc.
Let's move to that model instead of worrying about how to spread
storage logic across QEMU and libvirt.
Would NBD protocol fit to this purpose, or is it too simple? Then
libvirt would handle the storage format completely and present an NBD
interface to QEMU (or give an fd to an external service) and QEMU
would not care about the storage format in this mode at all.
NBD does not support flush (fdatasync).  Therefore it only supports
the slow cache=writethrough mode in a safe manner.
Maybe NBD could still be used in networked setups as a secondary alternative.
It would be neat to use virtio-blk as the interface because it can be
passed through to the guest.  The guest talks directly to the storage
management service without going through QEMU.  The trick is to do
1. An ioeventfd for virtqueue (guest->host) kicks
2. An irqfd for host->guest kicks
3. Shared memory for vring and zero-copy data access
The storage management service provides a UNIX domain socket over
which fds can be passed to set up the vhost-like virtio-blk interface.
Moving the image format code into a separate program makes it possible
to safely write to a backing file while VMs are using it because the
storage service can be host-wide, not per-VM.  For example, streaming
a shared backing file over NFS while running VMs using copy-on-write
images.  If we ever want to do deduplication or other global
operations, then this approach is nice too.
The storage service manages image files including creation, deletion,
snapshotting, and actual I/O.  QEMU uses a vhost-like virtio-blk
interface and can pass it directly into the guest.  libvirt uses the
storage service API without needing to parse image files or keep track
of backing file relationships.
Excellent plan. If one day kernel provides builtin virtio-blk services
which can be passed via libvirt and QEMU to the guest, we'll even have
zero copy all the way.
Kevin Wolf
2011-07-22 07:22:43 UTC
Permalink
Post by Stefan Hajnoczi
Post by Eric Blake
Thank you for persisting - you've found another hole that needs to be
plugged. It sounds like you are proposing that after a qemu process dies,
that libvirt re-reads the qcow2 metadata headers, and validates that the
backing file information has not changed in a manner unexpected by libvirt.
If it has, then the qemu process that just died was compromised to the
point that restarting a new qemu process from the old image is now a
security risk. So this is _yet another_ security aspect that needs to be
coded into libvirt as part of hardening sVirt.
The backing file information changes when image streaming completes.
Before: fedora.img <- my_vm.qed
After: my_vm.qed (fedora.img is no longer referenced)
The image streaming operation copies data out of fedora.img and
populates my_vm.qed. When image streaming completes, the backing file
is no longer needed and my_vm.qed is updated to drop the backing file.
I think we need to design carefully to prevent QEMU and libvirt making
incorrect assumptions about who does what. I really wish that all
this image file business was outside QEMU and libvirt - that we had a
separate storage management service which handled the details. QEMU
would only do block device operations (no image format manipulation),
and libvirt would only delegate to the storage management service.
And how do you implement that in a way that works on all platforms, and
without root privileges? I can't see this happen unless it stays
completely optional.

Kevin
Stefan Hajnoczi
2011-07-22 09:11:17 UTC
Permalink
Post by Kevin Wolf
Post by Stefan Hajnoczi
Post by Eric Blake
Thank you for persisting - you've found another hole that needs to be
plugged.  It sounds like you are proposing that after a qemu process dies,
that libvirt re-reads the qcow2 metadata headers, and validates that the
backing file information has not changed in a manner unexpected by libvirt.
 If it has, then the qemu process that just died was compromised to the
point that restarting a new qemu process from the old image is now a
security risk.  So this is _yet another_ security aspect that needs to be
coded into libvirt as part of hardening sVirt.
The backing file information changes when image streaming completes.
Before: fedora.img <- my_vm.qed
After: my_vm.qed (fedora.img is no longer referenced)
The image streaming operation copies data out of fedora.img and
populates my_vm.qed.  When image streaming completes, the backing file
is no longer needed and my_vm.qed is updated to drop the backing file.
I think we need to design carefully to prevent QEMU and libvirt making
incorrect assumptions about who does what.  I really wish that all
this image file business was outside QEMU and libvirt - that we had a
separate storage management service which handled the details.  QEMU
would only do block device operations (no image format manipulation),
and libvirt would only delegate to the storage management service.
And how do you implement that in a way that works on all platforms, and
without root privileges? I can't see this happen unless it stays
completely optional.
The cross-platform way would be an iSCSI target that understands image
formats. But iSCSI requires copying when doing I/O and we can't pass
through virtio-blk.

I'm not sure I see the root privilege issue.

Stefan
Blue Swirl
2011-07-22 16:05:18 UTC
Permalink
Post by Stefan Hajnoczi
Post by Kevin Wolf
Post by Stefan Hajnoczi
Post by Eric Blake
Thank you for persisting - you've found another hole that needs to be
plugged.  It sounds like you are proposing that after a qemu process dies,
that libvirt re-reads the qcow2 metadata headers, and validates that the
backing file information has not changed in a manner unexpected by libvirt.
 If it has, then the qemu process that just died was compromised to the
point that restarting a new qemu process from the old image is now a
security risk.  So this is _yet another_ security aspect that needs to be
coded into libvirt as part of hardening sVirt.
The backing file information changes when image streaming completes.
Before: fedora.img <- my_vm.qed
After: my_vm.qed (fedora.img is no longer referenced)
The image streaming operation copies data out of fedora.img and
populates my_vm.qed.  When image streaming completes, the backing file
is no longer needed and my_vm.qed is updated to drop the backing file.
I think we need to design carefully to prevent QEMU and libvirt making
incorrect assumptions about who does what.  I really wish that all
this image file business was outside QEMU and libvirt - that we had a
separate storage management service which handled the details.  QEMU
would only do block device operations (no image format manipulation),
and libvirt would only delegate to the storage management service.
And how do you implement that in a way that works on all platforms, and
without root privileges? I can't see this happen unless it stays
completely optional.
The cross-platform way would be an iSCSI target that understands image
formats.  But iSCSI requires copying when doing I/O and we can't pass
through virtio-blk.
The guest could use iSCSI directly using the network interface without
virtio-blk. This setup wouldn't give max performance in local use but
it could also be useful in some networked setups and probably more
useful than NBD.
Kevin Wolf
2011-07-20 09:50:53 UTC
Permalink
Post by Daniel P. Berrange
Post by Stefan Hajnoczi
Post by Jes Sorensen
Post by Eric Blake
[adding the libvir-list]
Urgh, libvirt parsing image files is really unfortunate, it really
doesn't give me warm fuzzy feelings :( libvirt really should not know
about internals of image formats.
But even if you add new features to qemu to avoid needing this in the
future, it doesn't change the past - libvirt will always have to know
how to parse image files understood by older qemu, and so as long as
libvirt already knows how to do that parsing, we might as well take
advantage of it.
What has been done here in the past is plain wrong. Continuing to do it
isn't the right thing to do here.
Post by Eric Blake
Besides, I feel that having a well-documented file format, so that
independent applications can both parse the same file with the same
semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not
get added and libvirt will never be able to keep up. The driver for a
file format belongs in QEMU and nowhere else.
It should be a goal to avoid dependencies in multiple layers of the
stack because it becomes are to add new features - they require
coordinated changes in multiple layers. Having both QEMU and libvirt
know the internals of image files is such a multi-dependency. If I
want to add a new format or change an existing format I have to touch
both layers.
For fd-passing perhaps we have an opportunity to use a callback
mechanism (QEMU request: filename -> libvirt response: fd) and do all
the image format parsing in QEMU.
The reason why libvirt does the parsing of file headers to determine
backing files is to maintain the trust boundary. Everything run from
the exec() of QEMU onwards is considered untrusted code. So having
QEMU parsing the file headers & passing back open() requests to libvirt
is breaking the trust boundary.
NB, i'm not happy about libvirt having to have knowledge of file format
headers, but we needed something more efficient & reliable than invoking
qemu-img info & parsing the output.
What's the real problem with this approach? Parsing the data meant for
humans, from an interface that is potentially unstable? If this is it,
it should be easy enough to add a JSON output mode to qemu-img info.
Post by Daniel P. Berrange
Ideally QEMU (or something else)
would provide a library libblockformat.so with stable APIs for at least
reading metadata about image formats. If it had APIs for image creation,
etc too that would be a bonus, but we're more or less ok spawning qemu-img
for those cases currently.
I'm afraid the block drivers have too many dependencies on the qemu core
for this to be an option without investing a lot of effort.

Kevin
Daniel P. Berrange
2011-07-20 10:18:46 UTC
Permalink
Post by Kevin Wolf
Post by Daniel P. Berrange
Post by Stefan Hajnoczi
Post by Jes Sorensen
Post by Eric Blake
[adding the libvir-list]
Urgh, libvirt parsing image files is really unfortunate, it really
doesn't give me warm fuzzy feelings :( libvirt really should not know
about internals of image formats.
But even if you add new features to qemu to avoid needing this in the
future, it doesn't change the past - libvirt will always have to know
how to parse image files understood by older qemu, and so as long as
libvirt already knows how to do that parsing, we might as well take
advantage of it.
What has been done here in the past is plain wrong. Continuing to do it
isn't the right thing to do here.
Post by Eric Blake
Besides, I feel that having a well-documented file format, so that
independent applications can both parse the same file with the same
semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not
get added and libvirt will never be able to keep up. The driver for a
file format belongs in QEMU and nowhere else.
It should be a goal to avoid dependencies in multiple layers of the
stack because it becomes are to add new features - they require
coordinated changes in multiple layers. Having both QEMU and libvirt
know the internals of image files is such a multi-dependency. If I
want to add a new format or change an existing format I have to touch
both layers.
For fd-passing perhaps we have an opportunity to use a callback
mechanism (QEMU request: filename -> libvirt response: fd) and do all
the image format parsing in QEMU.
The reason why libvirt does the parsing of file headers to determine
backing files is to maintain the trust boundary. Everything run from
the exec() of QEMU onwards is considered untrusted code. So having
QEMU parsing the file headers & passing back open() requests to libvirt
is breaking the trust boundary.
NB, i'm not happy about libvirt having to have knowledge of file format
headers, but we needed something more efficient & reliable than invoking
qemu-img info & parsing the output.
What's the real problem with this approach? Parsing the data meant for
humans, from an interface that is potentially unstable? If this is it,
it should be easy enough to add a JSON output mode to qemu-img info.
It is a really heavyweight solution to have to spawn qemu-img every
time we need to access this data, when it can be done with a trivial
open+read+close sequence. In addition the output data format is not
entirely pleasant for machine reading (some fields only have data
rounded up to MB, not the raw byte count). Finally, we also wanted
to be able to extract some basic metdata about disk image formats on
non-QEMU hosts, for our storage management APIs which are used on Xen
or VMWare hosts where many of these same disk image formats are also
used. A JSON output mode would be helpful, but unfortunately can't
really address the other issues.
Post by Kevin Wolf
Post by Daniel P. Berrange
Ideally QEMU (or something else)
would provide a library libblockformat.so with stable APIs for at least
reading metadata about image formats. If it had APIs for image creation,
etc too that would be a bonus, but we're more or less ok spawning qemu-img
for those cases currently.
I'm afraid the block drivers have too many dependencies on the qemu core
for this to be an option without investing a lot of effort.
That's why I sort of think there is value in having someone provide a
standalone library API for querying some core set of block format
metadata. QEMU is but one project with virtual disk formats, there are
plenty of others out there in existance, so while reusing QEMU block
code would be nice, it isn't leading to any significant reduction in
copies of block format parsing code amongst all the virt projects in
existance.

Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Anthony Liguori
2011-07-19 16:14:09 UTC
Permalink
Post by Jes Sorensen
Post by Eric Blake
[adding the libvir-list]
Urgh, libvirt parsing image files is really unfortunate, it really
doesn't give me warm fuzzy feelings :( libvirt really should not know
about internals of image formats.
But even if you add new features to qemu to avoid needing this in the
future, it doesn't change the past - libvirt will always have to know
how to parse image files understood by older qemu, and so as long as
libvirt already knows how to do that parsing, we might as well take
advantage of it.
What has been done here in the past is plain wrong. Continuing to do it
isn't the right thing to do here.
Post by Eric Blake
Besides, I feel that having a well-documented file format, so that
independent applications can both parse the same file with the same
semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not
get added and libvirt will never be able to keep up. The driver for a
file format belongs in QEMU and nowhere else.
Post by Eric Blake
It would be nice if libvirt had a way to pass fds for every disk and
backing file up front; then, SELinux can work around the lack of NFS
per-file labelling by blocking open() in qemu. In fact, this has
A cleaner solution seems to have libvirt provide a call-back allowing
QEMU to call out and have libvirt open a file descriptor instead. This
way libvirt can validate it and open it for QEMU and pass it back.
Yes, that could probably be made to work with libvirt.
I am a little frustrated this approach wasn't taken up front instead of
the evil hack of having libvirt attempt to parse image files.
Post by Eric Blake
If we cannot do something like this, I would prefer to have backing
files on NFS should simply not be supported when running in an selinux
setup.
As nice as that sentiment is, it will never fly, because it would be a
regression in current behavior. The whole reason that the virt_use_nfs
SELinux bool exists is that some people are willing to make the partial
security tradeoff. Besides, the use of sVirt via SELinux is more than
just open() protection - while the current virt_use_nfs bool makes NFS
less secure than otherwise possible, it still gives some nice guarantees
to the rest of the qemu process such as passthrough accesses to local
pci devices.
Well leaving things at status quo is not making it worse, it just leaves
an evil in place.
NFS and SELinux is a fundamental problem with SELinux and NFS. We can
piss and moan as much as we want about it but it's reality. SELinux
fundamentally requires extended attributes. By the time NFS adds
extended attribute support, we'll all be flying around in hover cars.

As terrible as NFS is, people use it all of the time.

It would be nice if libvirt had the ability to make better use of DAC to
support isolation. The fact that MAC is the only way you can do
isolation between guests is pretty unfortunate. If I could assign
specific UIDs to a guest and use that to enforce isolation, it would go
a long ways to solving this problem.

Regards,

Anthony Liguori
Jes Sorensen
2011-07-20 08:25:37 UTC
Permalink
Post by Anthony Liguori
Post by Jes Sorensen
Post by Eric Blake
As nice as that sentiment is, it will never fly, because it would be a
regression in current behavior. The whole reason that the virt_use_nfs
SELinux bool exists is that some people are willing to make the partial
security tradeoff. Besides, the use of sVirt via SELinux is more than
just open() protection - while the current virt_use_nfs bool makes NFS
less secure than otherwise possible, it still gives some nice guarantees
to the rest of the qemu process such as passthrough accesses to local
pci devices.
Well leaving things at status quo is not making it worse, it just leaves
an evil in place.
NFS and SELinux is a fundamental problem with SELinux and NFS. We can
piss and moan as much as we want about it but it's reality. SELinux
fundamentally requires extended attributes. By the time NFS adds
extended attribute support, we'll all be flying around in hover cars.
As terrible as NFS is, people use it all of the time.
It would be nice if libvirt had the ability to make better use of DAC to
support isolation. The fact that MAC is the only way you can do
isolation between guests is pretty unfortunate. If I could assign
specific UIDs to a guest and use that to enforce isolation, it would go
a long ways to solving this problem.
Right, we're stuck with the two horros of NFS and selinux, so we need
something that gets around the problem. In a sane world we would simply
say 'no NFS, no selinux', but as you say that will never happen.

My suggestion of a callback mechanism where libvirt registers the
callback with QEMU for open() calls, allowing libvirt to perform the
open and return the open file descriptor would get around this problem.

Jes
Kevin Wolf
2011-07-20 10:01:11 UTC
Permalink
Post by Jes Sorensen
Post by Anthony Liguori
Post by Jes Sorensen
Post by Eric Blake
As nice as that sentiment is, it will never fly, because it would be a
regression in current behavior. The whole reason that the virt_use_nfs
SELinux bool exists is that some people are willing to make the partial
security tradeoff. Besides, the use of sVirt via SELinux is more than
just open() protection - while the current virt_use_nfs bool makes NFS
less secure than otherwise possible, it still gives some nice guarantees
to the rest of the qemu process such as passthrough accesses to local
pci devices.
Well leaving things at status quo is not making it worse, it just leaves
an evil in place.
NFS and SELinux is a fundamental problem with SELinux and NFS. We can
piss and moan as much as we want about it but it's reality. SELinux
fundamentally requires extended attributes. By the time NFS adds
extended attribute support, we'll all be flying around in hover cars.
As terrible as NFS is, people use it all of the time.
It would be nice if libvirt had the ability to make better use of DAC to
support isolation. The fact that MAC is the only way you can do
isolation between guests is pretty unfortunate. If I could assign
specific UIDs to a guest and use that to enforce isolation, it would go
a long ways to solving this problem.
Right, we're stuck with the two horros of NFS and selinux, so we need
something that gets around the problem. In a sane world we would simply
say 'no NFS, no selinux', but as you say that will never happen.
My suggestion of a callback mechanism where libvirt registers the
callback with QEMU for open() calls, allowing libvirt to perform the
open and return the open file descriptor would get around this problem.
To me this sounds more like a problem than a solution. It basically
means that during an open (which may even be initiated by a monitor
command), you need monitor interaction. It basically means that open
becomes asynchronous, and requires clients to deal with that, which
sounds at least "interesting"... Also you have to add some magic to all
places opening something.

I think if libvirt wants qemu to use an fd instead of a file name, it
shouldn't pass a file name but an fd in the first place. Which means
that the two that we need are support for an fd: protocol (patches on
the list, need review), and a way for libvirt to override the backing
file of an image.

Kevin
Jes Sorensen
2011-07-20 13:25:09 UTC
Permalink
Post by Kevin Wolf
Post by Jes Sorensen
Right, we're stuck with the two horros of NFS and selinux, so we need
something that gets around the problem. In a sane world we would simply
say 'no NFS, no selinux', but as you say that will never happen.
My suggestion of a callback mechanism where libvirt registers the
callback with QEMU for open() calls, allowing libvirt to perform the
open and return the open file descriptor would get around this problem.
To me this sounds more like a problem than a solution. It basically
means that during an open (which may even be initiated by a monitor
command), you need monitor interaction. It basically means that open
becomes asynchronous, and requires clients to deal with that, which
sounds at least "interesting"... Also you have to add some magic to all
places opening something.
I think if libvirt wants qemu to use an fd instead of a file name, it
shouldn't pass a file name but an fd in the first place. Which means
that the two that we need are support for an fd: protocol (patches on
the list, need review), and a way for libvirt to override the backing
file of an image.
The problem is that QEMU will find backing file file names inside the
images which it will be unable to open. How do you suggest we get around
that?

Jes
Eric Blake
2011-07-20 13:46:58 UTC
Permalink
Post by Jes Sorensen
Post by Kevin Wolf
I think if libvirt wants qemu to use an fd instead of a file name, it
shouldn't pass a file name but an fd in the first place. Which means
that the two that we need are support for an fd: protocol (patches on
the list, need review), and a way for libvirt to override the backing
file of an image.
The problem is that QEMU will find backing file file names inside the
images which it will be unable to open. How do you suggest we get around
that?
We've already told you - qemu must have a way to be passed fds which are
associated with names, and when a file refers to another backing file by
name, then qemu falls back on its fd/name mapping to use the
already-passed fd instead. Which implies that someone else, either
libvirt or a qemu-maintained libblockformat.so, needs to have a stable
interface for parsing the backing file name out of an arbitrary qcow2
file, and that this interface must work no matter how many other
extensions are added to qcow2.
--
Eric Blake ***@redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
Blue Swirl
2011-07-20 17:27:15 UTC
Permalink
Post by Eric Blake
Post by Jes Sorensen
Post by Kevin Wolf
I think if libvirt wants qemu to use an fd instead of a file name, it
shouldn't pass a file name but an fd in the first place. Which means
that the two that we need are support for an fd: protocol (patches on
the list, need review), and a way for libvirt to override the backing
file of an image.
The problem is that QEMU will find backing file file names inside the
images which it will be unable to open. How do you suggest we get around
that?
We've already told you - qemu must have a way to be passed fds which are
associated with names, and when a file refers to another backing file by
name, then qemu falls back on its fd/name mapping to use the already-passed
fd instead.  Which implies that someone else, either libvirt or a
qemu-maintained libblockformat.so, needs to have a stable interface for
parsing the backing file name out of an arbitrary qcow2 file, and that this
interface must work no matter how many other extensions are added to qcow2.
I'd avoid any name based access in this case. If QEMU has write access
to main file, it could forge the backing file name in main file to
point to for example /etc/shadow and then request libvirt to perform
the opening.
Eric Blake
2011-07-20 17:47:41 UTC
Permalink
Post by Blue Swirl
Post by Eric Blake
We've already told you - qemu must have a way to be passed fds which are
associated with names, and when a file refers to another backing file by
name, then qemu falls back on its fd/name mapping to use the already-passed
fd instead. Which implies that someone else, either libvirt or a
qemu-maintained libblockformat.so, needs to have a stable interface for
parsing the backing file name out of an arbitrary qcow2 file, and that this
interface must work no matter how many other extensions are added to qcow2.
I'd avoid any name based access in this case. If QEMU has write access
to main file, it could forge the backing file name in main file to
point to for example /etc/shadow and then request libvirt to perform
the opening.
Won't work. Well, it might work within the context of a single qemu
process. But when that process ends, then libvirt would have to touch
up the qcow2 headers of that file to replace the /etc/shadow name with
the real backing file name, otherwise, the next time you restart
qemu-img or a new qemu guest using the same image, the information has
been lost, since the fd has been closed in the meantime.

We really _do_ need a way to give qemu both an fd and the name of the
file that the fd is tied to. On Linux, qemu could use /proc/self/fd to
reconstruct the name from fd, but that's not portable to other OS. And
we've already discussed how in the libvirt model, that libvirt is deemed
more secure than qemu. Therefore, I think it is reasonable for qemu to
make the assumptions that if it exposes a monitor command where the
supervisor (libvirt or otherwise) can pass in both an fd and a file
name, that either the supervisor is passing in correct information, or
that the bug is in the supervisor and not in qemu if the supervisor
passes in wrong information and things blow up.

And the snapshot_blkdev monitor command is a case where qemu needs to
create a new qcow2 image on the fly, while referencing the name of an
existing file. What backing name do you put in the new qcow2 file
unless you already have a name association for all fds already open for
the existing backing file?
--
Eric Blake ***@redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
Blue Swirl
2011-07-20 19:51:19 UTC
Permalink
Post by Blue Swirl
Post by Eric Blake
We've already told you - qemu must have a way to be passed fds which are
associated with names, and when a file refers to another backing file by
name, then qemu falls back on its fd/name mapping to use the
already-passed
fd instead.  Which implies that someone else, either libvirt or a
qemu-maintained libblockformat.so, needs to have a stable interface for
parsing the backing file name out of an arbitrary qcow2 file, and that this
interface must work no matter how many other extensions are added to qcow2.
I'd avoid any name based access in this case. If QEMU has write access
to main file, it could forge the backing file name in main file to
point to for example /etc/shadow and then request libvirt to perform
the opening.
Won't work.  Well, it might work within the context of a single qemu
process.  But when that process ends, then libvirt would have to touch up
the qcow2 headers of that file to replace the /etc/shadow name with the real
backing file name, otherwise, the next time you restart qemu-img or a new
qemu guest using the same image, the information has been lost, since the fd
has been closed in the meantime.
How would libvirt know to do this touch up?
We really _do_ need a way to give qemu both an fd and the name of the file
that the fd is tied to.  On Linux, qemu could use /proc/self/fd to
reconstruct the name from fd, but that's not portable to other OS.  And
we've already discussed how in the libvirt model, that libvirt is deemed
more secure than qemu.  Therefore, I think it is reasonable for qemu to make
the assumptions that if it exposes a monitor command where the supervisor
(libvirt or otherwise) can pass in both an fd and a file name, that either
the supervisor is passing in correct information, or that the bug is in the
supervisor and not in qemu if the supervisor passes in wrong information and
things blow up.
Yes, I'm not worried about QEMU getting confused by libvirt.
And the snapshot_blkdev monitor command is a case where qemu needs to create
a new qcow2 image on the fly, while referencing the name of an existing
file.  What backing name do you put in the new qcow2 file unless you already
have a name association for all fds already open for the existing backing
file?
For backing file with original name of "/path/in/storage", QEMU could
present the fd and the backin path by requesting something like
"fd:12,/path/in/storage". The next file in chain "/path2/file" would
be "fd:12,fd:34,/path2/file". Or if possible, -fd 12 -backing
/path/in/storage with spaces and funny characters escaped etc.
Jes Sorensen
2011-07-21 08:07:57 UTC
Permalink
Post by Blue Swirl
And the snapshot_blkdev monitor command is a case where qemu needs to create
a new qcow2 image on the fly, while referencing the name of an existing
file. What backing name do you put in the new qcow2 file unless you already
have a name association for all fds already open for the existing backing
file?
For backing file with original name of "/path/in/storage", QEMU could
present the fd and the backin path by requesting something like
"fd:12,/path/in/storage". The next file in chain "/path2/file" would
be "fd:12,fd:34,/path2/file". Or if possible, -fd 12 -backing
/path/in/storage with spaces and funny characters escaped etc.
Rather than trying to do this by mangling files on the disk, I would
suggest we allow registering a call-back open function, which calls back
into libvirt and requests it to open a given file. It can then do all
it's security foo to decide whether or not to allow the file to be open.

This is relatively clean and avoids the mess of relying on outside
processes messing about in the images.

Cheers,
Jes
Kevin Wolf
2011-07-21 08:36:14 UTC
Permalink
Post by Jes Sorensen
Post by Blue Swirl
And the snapshot_blkdev monitor command is a case where qemu needs to create
a new qcow2 image on the fly, while referencing the name of an existing
file. What backing name do you put in the new qcow2 file unless you already
have a name association for all fds already open for the existing backing
file?
For backing file with original name of "/path/in/storage", QEMU could
present the fd and the backin path by requesting something like
"fd:12,/path/in/storage". The next file in chain "/path2/file" would
be "fd:12,fd:34,/path2/file". Or if possible, -fd 12 -backing
/path/in/storage with spaces and funny characters escaped etc.
Rather than trying to do this by mangling files on the disk, I would
suggest we allow registering a call-back open function, which calls back
into libvirt and requests it to open a given file. It can then do all
it's security foo to decide whether or not to allow the file to be open.
This is relatively clean and avoids the mess of relying on outside
processes messing about in the images.
You forget that libvirt parses images exactly to decide whether to allow
accesses or not, so they would still do it. Which shouldn't be a big
problem anyway as long as the libvirt implementation is compliant to the
image format specification (even though it's not nice, of course).

I just wonder why libvirt doesn't trust qemu enough that it can open()
what it wants, but at the same time relies for this check on information
in image files which this very same qemu can modify.

Kevin
Jes Sorensen
2011-07-21 08:45:51 UTC
Permalink
Post by Kevin Wolf
Post by Jes Sorensen
Rather than trying to do this by mangling files on the disk, I would
suggest we allow registering a call-back open function, which calls back
into libvirt and requests it to open a given file. It can then do all
it's security foo to decide whether or not to allow the file to be open.
This is relatively clean and avoids the mess of relying on outside
processes messing about in the images.
You forget that libvirt parses images exactly to decide whether to allow
accesses or not, so they would still do it. Which shouldn't be a big
problem anyway as long as the libvirt implementation is compliant to the
image format specification (even though it's not nice, of course).
As I just responded to Daniel, this doesn't make sense. If libvirt wants
to guard against a compromised QEMU, it cannot rely on things written
into image files headers by QEMU.

If we trust contents of image files, there is no reason not to grant
QEMU full access to files on NFS either, as we have effectively already
granted that by trusting the image file content.

In other words, if the goal is to make this secure, lets make it secure,
otherwise, lets stop pretending.
Post by Kevin Wolf
I just wonder why libvirt doesn't trust qemu enough that it can open()
what it wants, but at the same time relies for this check on information
in image files which this very same qemu can modify.
Probably for the same reason few QEMU developers trust libvirt....

Jes
Blue Swirl
2011-07-21 19:34:29 UTC
Permalink
Post by Jes Sorensen
Post by Blue Swirl
And the snapshot_blkdev monitor command is a case where qemu needs to create
a new qcow2 image on the fly, while referencing the name of an existing
file.  What backing name do you put in the new qcow2 file unless you already
have a name association for all fds already open for the existing backing
file?
For backing file with original name of "/path/in/storage", QEMU could
present the fd and the backin path by requesting something like
"fd:12,/path/in/storage". The next file in chain "/path2/file" would
be "fd:12,fd:34,/path2/file". Or if possible, -fd 12 -backing
/path/in/storage with spaces and funny characters escaped etc.
Rather than trying to do this by mangling files on the disk, I would
suggest we allow registering a call-back open function, which calls back
into libvirt and requests it to open a given file. It can then do all
it's security foo to decide whether or not to allow the file to be open.
Just to clarify: I was not proposing any mangling of the files.
Post by Jes Sorensen
This is relatively clean and avoids the mess of relying on outside
processes messing about in the images.
Cheers,
Jes
Jes Sorensen
2011-07-21 08:06:02 UTC
Permalink
Post by Eric Blake
Post by Blue Swirl
I'd avoid any name based access in this case. If QEMU has write access
to main file, it could forge the backing file name in main file to
point to for example /etc/shadow and then request libvirt to perform
the opening.
Won't work. Well, it might work within the context of a single qemu
process. But when that process ends, then libvirt would have to touch
up the qcow2 headers of that file to replace the /etc/shadow name with
the real backing file name, otherwise, the next time you restart
qemu-img or a new qemu guest using the same image, the information has
been lost, since the fd has been closed in the meantime.
We really _do_ need a way to give qemu both an fd and the name of the
file that the fd is tied to. On Linux, qemu could use /proc/self/fd to
reconstruct the name from fd, but that's not portable to other OS. And
we've already discussed how in the libvirt model, that libvirt is deemed
more secure than qemu.
I am sorry, but this is where your assertion fails completely. QEMU
cannot trust libvirt being able to parse it's image files correctly, so
this is a total no-go.
Post by Eric Blake
Therefore, I think it is reasonable for qemu to
make the assumptions that if it exposes a monitor command where the
supervisor (libvirt or otherwise) can pass in both an fd and a file
name, that either the supervisor is passing in correct information, or
that the bug is in the supervisor and not in qemu if the supervisor
passes in wrong information and things blow up.
Sorry but this is not a reasonable situation given how you want to abuse it.
Post by Eric Blake
And the snapshot_blkdev monitor command is a case where qemu needs to
create a new qcow2 image on the fly, while referencing the name of an
existing file. What backing name do you put in the new qcow2 file
unless you already have a name association for all fds already open for
the existing backing file?
It is more than that, the problem is also there when you try to open an
image that has a backing file in an selinux environment where qemu isn't
allowed to open the backing file.

This problem needs a proper solution which does not involve libvirt
messing about in binary files where it has no business of being.

Jes
Kevin Wolf
2011-07-21 08:18:37 UTC
Permalink
Post by Eric Blake
We really _do_ need a way to give qemu both an fd and the name of the
file that the fd is tied to. On Linux, qemu could use /proc/self/fd to
reconstruct the name from fd, but that's not portable to other OS.
Is there any reason for qemu to know the file name if libvirt wants it
to use only pre-opened fds anyway? I don't think it should even know the
file name.
Post by Eric Blake
And
we've already discussed how in the libvirt model, that libvirt is deemed
more secure than qemu. Therefore, I think it is reasonable for qemu to
make the assumptions that if it exposes a monitor command where the
supervisor (libvirt or otherwise) can pass in both an fd and a file
name, that either the supervisor is passing in correct information, or
that the bug is in the supervisor and not in qemu if the supervisor
passes in wrong information and things blow up.
And the snapshot_blkdev monitor command is a case where qemu needs to
create a new qcow2 image on the fly, while referencing the name of an
existing file. What backing name do you put in the new qcow2 file
unless you already have a name association for all fds already open for
the existing backing file?
Without adding anything, it would put in fd:42. Absolutely useless, but
it's a managed VM anyway and libvirt is overriding the backing file with
a new fd each time it runs the VM, so it doesn't really matter.

It starts to matter if you use the image outside libvirt, then it would
be nice to have the right information there. That may be a point for
adding the "real" file name to snapshot_blkdev.

Kevin
Jes Sorensen
2011-07-21 08:48:26 UTC
Permalink
Post by Kevin Wolf
Post by Eric Blake
We really _do_ need a way to give qemu both an fd and the name of the
file that the fd is tied to. On Linux, qemu could use /proc/self/fd to
reconstruct the name from fd, but that's not portable to other OS.
Is there any reason for qemu to know the file name if libvirt wants it
to use only pre-opened fds anyway? I don't think it should even know the
file name.
In this case we shouldn't write backing file filenames into image files,
but some sort of libvirt token...... then again that will be a lot of
fun if you want to run things on the command line afterwards.

Jes
Jes Sorensen
2011-07-21 08:02:42 UTC
Permalink
Post by Eric Blake
Post by Jes Sorensen
Post by Kevin Wolf
I think if libvirt wants qemu to use an fd instead of a file name, it
shouldn't pass a file name but an fd in the first place. Which means
that the two that we need are support for an fd: protocol (patches on
the list, need review), and a way for libvirt to override the backing
file of an image.
The problem is that QEMU will find backing file file names inside the
images which it will be unable to open. How do you suggest we get around
that?
We've already told you - qemu must have a way to be passed fds which are
associated with names, and when a file refers to another backing file by
name, then qemu falls back on its fd/name mapping to use the
already-passed fd instead. Which implies that someone else, either
libvirt or a qemu-maintained libblockformat.so, needs to have a stable
interface for parsing the backing file name out of an arbitrary qcow2
file, and that this interface must work no matter how many other
extensions are added to qcow2.
As I replied to you earlier, libvirt is *not* allowed to be messing with
internals of image files! Passing in backing file fds as a result of
libvirt messing around in internals of the image headers is utterly
broken and is not going to happen!

Jes
Kevin Wolf
2011-07-20 13:51:23 UTC
Permalink
Post by Jes Sorensen
Post by Kevin Wolf
Post by Jes Sorensen
Right, we're stuck with the two horros of NFS and selinux, so we need
something that gets around the problem. In a sane world we would simply
say 'no NFS, no selinux', but as you say that will never happen.
My suggestion of a callback mechanism where libvirt registers the
callback with QEMU for open() calls, allowing libvirt to perform the
open and return the open file descriptor would get around this problem.
To me this sounds more like a problem than a solution. It basically
means that during an open (which may even be initiated by a monitor
command), you need monitor interaction. It basically means that open
becomes asynchronous, and requires clients to deal with that, which
sounds at least "interesting"... Also you have to add some magic to all
places opening something.
I think if libvirt wants qemu to use an fd instead of a file name, it
shouldn't pass a file name but an fd in the first place. Which means
that the two that we need are support for an fd: protocol (patches on
the list, need review), and a way for libvirt to override the backing
file of an image.
The problem is that QEMU will find backing file file names inside the
images which it will be unable to open. How do you suggest we get around
that?
This is the part with allowing libvirt to override the backing file. Of
course, this is not something that we can add with five lines of code,
it requires -blockdev.

Kevin
Blue Swirl
2011-07-20 17:20:03 UTC
Permalink
Post by Kevin Wolf
Post by Jes Sorensen
Post by Kevin Wolf
Post by Jes Sorensen
Right, we're stuck with the two horros of NFS and selinux, so we need
something that gets around the problem. In a sane world we would simply
say 'no NFS, no selinux', but as you say that will never happen.
My suggestion of a callback mechanism where libvirt registers the
callback with QEMU for open() calls, allowing libvirt to perform the
open and return the open file descriptor would get around this problem.
To me this sounds more like a problem than a solution. It basically
means that during an open (which may even be initiated by a monitor
command), you need monitor interaction. It basically means that open
becomes asynchronous, and requires clients to deal with that, which
sounds at least "interesting"... Also you have to add some magic to all
places opening something.
I think if libvirt wants qemu to use an fd instead of a file name, it
shouldn't pass a file name but an fd in the first place. Which means
that the two that we need are support for an fd: protocol (patches on
the list, need review), and a way for libvirt to override the backing
file of an image.
The problem is that QEMU will find backing file file names inside the
images which it will be unable to open. How do you suggest we get around
that?
This is the part with allowing libvirt to override the backing file. Of
course, this is not something that we can add with five lines of code,
it requires -blockdev.
There could still be some issues:
Let's have files A, B, C etc. with backing files AA etc. How would
libvirt know that when QEMU wants to write to file CA, this is because
it's needed to access C, or is it just trickery by a devious guest to
corrupt storage?

This could be handled so that instead of naming the backing file, QEMU
asks for a descriptor for the backing file by presenting the
descriptor to main file C, but I think the real solution is that
libvirt should handle the storage formats completely and it should
present QEMU with only a raw file like interface (read/write/seek) for
the data. Then any backing files would be handled within libvirt.
Performance could suffer, though.
Eric Blake
2011-07-20 17:41:55 UTC
Permalink
Post by Blue Swirl
Let's have files A, B, C etc. with backing files AA etc. How would
libvirt know that when QEMU wants to write to file CA, this is because
it's needed to access C, or is it just trickery by a devious guest to
corrupt storage?
The fix for CVE-2010-2238 already deals with this: if primary image C
refers to backing file CA of raw format, but does not state what file
format CA contains, then a malicious guest can modify the contents of CA
to appear to be yet another qcow2 image. At which point, if libvirt
follows the backing file specified in CA, then yes, the malicious guest
really can cause libvirt to expose arbitrary file CB for manipulation by
the guest. But that security hole was already plugged - by default,
libvirt refuses to probe backing files parsed from qcow2 headers for
file format, but instead requires the outer qcow2 header to also include
the a file format designation for the backing file. At which point, you
then have a safe chain: if C refers to CA, then libvirt knows that both
C and CA are essential to the storage presented by giving qemu the file
name C, and the guest will already be modifying CA, but there is no
storage corruption involved.

That is, as long as libvirt can already accurately read the chain of
backing files from any starting point, then it can hand that entire
chain of backing files (whether by the topmost file name as it does now,
or whether by a series of fds as is being proposed) to qemu.
Post by Blue Swirl
This could be handled so that instead of naming the backing file, QEMU
asks for a descriptor for the backing file by presenting the
descriptor to main file C, but I think the real solution is that
libvirt should handle the storage formats completely and it should
present QEMU with only a raw file like interface (read/write/seek) for
the data. Then any backing files would be handled within libvirt.
Performance could suffer, though.
The monitor interface was not designed to throw the
read()/write()/seek() burden back on libvirt, and indeed that would kill
performance so it is a non-starter idea. All we need for security is
the open() burden to be shifted out of qemu and into libvirt.
--
Eric Blake ***@redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
Blue Swirl
2011-07-20 18:00:04 UTC
Permalink
Post by Blue Swirl
Let's have files A, B, C etc. with backing files AA etc. How would
libvirt know that when QEMU wants to write to file CA, this is because
it's needed to access C, or is it just trickery by a devious guest to
corrupt storage?
The fix for CVE-2010-2238 already deals with this: if primary image C refers
to backing file CA of raw format, but does not state what file format CA
contains, then a malicious guest can modify the contents of CA to appear to
be yet another qcow2 image.  At which point, if libvirt follows the backing
file specified in CA, then yes, the malicious guest really can cause libvirt
to expose arbitrary file CB for manipulation by the guest.  But that
security hole was already plugged - by default, libvirt refuses to probe
backing files parsed from qcow2 headers for file format, but instead
requires the outer qcow2 header to also include the a file format
designation for the backing file.  At which point, you then have a safe
chain: if C refers to CA, then libvirt knows that both C and CA are
essential to the storage presented by giving qemu the file name C, and the
guest will already be modifying CA, but there is no storage corruption
involved.
But what if CA is accessed even if C is not? For example, QEMU opens C
(to determine CA and write new information about the path), closes it
and then requests CA?
That is, as long as libvirt can already accurately read the chain of backing
files from any starting point, then it can hand that entire chain of backing
files (whether by the topmost file name as it does now, or whether by a
series of fds as is being proposed) to qemu.
Post by Blue Swirl
This could be handled so that instead of naming the backing file, QEMU
asks for a descriptor for the backing file by presenting the
descriptor to main file C, but I think the real solution is that
libvirt should handle the storage formats completely and it should
present QEMU with only a raw file like interface (read/write/seek) for
the data. Then any backing files would be handled within libvirt.
Performance could suffer, though.
The monitor interface was not designed to throw the read()/write()/seek()
burden back on libvirt, and indeed that would kill performance so it is a
non-starter idea.  All we need for security is the open() burden to be
shifted out of qemu and into libvirt.
Obviously the interface should be faster than monitor, for example a
pair of sockets with some efficient protocol. Monitor could still be
used to set up these.
Eric Blake
2011-07-20 18:17:54 UTC
Permalink
Post by Blue Swirl
Post by Blue Swirl
Let's have files A, B, C etc. with backing files AA etc. How would
libvirt know that when QEMU wants to write to file CA, this is because
it's needed to access C, or is it just trickery by a devious guest to
corrupt storage?
The fix for CVE-2010-2238 already deals with this: if primary image C refers
to backing file CA of raw format, but does not state what file format CA
contains, then a malicious guest can modify the contents of CA to appear to
be yet another qcow2 image. At which point, if libvirt follows the backing
file specified in CA, then yes, the malicious guest really can cause libvirt
to expose arbitrary file CB for manipulation by the guest. But that
security hole was already plugged - by default, libvirt refuses to probe
backing files parsed from qcow2 headers for file format, but instead
requires the outer qcow2 header to also include the a file format
designation for the backing file. At which point, you then have a safe
chain: if C refers to CA, then libvirt knows that both C and CA are
essential to the storage presented by giving qemu the file name C, and the
guest will already be modifying CA, but there is no storage corruption
involved.
But what if CA is accessed even if C is not? For example, QEMU opens C
(to determine CA and write new information about the path), closes it
and then requests CA?
Why is qemu trying to access CA?

Either because CA was mentioned as a backing file for C (in which case
libvirt already knows about it, because either libvirt handed C to qemu
at startup time after already parsing C's headers to learn that CA is a
backing file, or because libvirt called the snapshot_blkdev command with
the intent of having qemu populate CA with C as its backing file), or
because qemu has a bug (in which case, libvirt should refuse the access
to CA).

Libvirt is already perfectly capable of tracking all files that qemu
might need to access, and whether it is qemu or libvirt that does the
open() of those files, we can still have libvirt validate whether each
request for a file makes sense given the context of all previous files
in use from the time the qemu command line was invoked and across all
monitor commands in the meantime.

On non-NFS solutions, where every file can have a SELinux label, then
the security is then present by merely having libvirt relabel all such
files to a unique label for that particular qemu process, and SELinux
merely enforces that qemu cannot open() anything but what libvirt has
already labeled. And since libvirt already knows which files to label
in the non-NFS scenario, it already knows which fds to pass in the NFS
scenario, at which point the ability to prevent qemu from open()ing an
NFS file is a security enhancement.

Your question about qemu wanting to use CA is thus answered
independently of whether the fd management solution is solved by libvirt
handing an fd for CA to qemu prior to any monitor command where qemu
will then need to use CA, or whether qemu is taught to asynchronously
ask libvirt to open an fd for CA on qemu's behalf. The answer is that
libvirt already tracks whether qemu should access CA, and just needs a
way to enforce that knowledge. The enforcement already exists for
non-NFS via SELinux labels, and the proposal to add fd handling will
expand that enforcement to also cover NFS.
--
Eric Blake ***@redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
Blue Swirl
2011-07-20 20:01:37 UTC
Permalink
Post by Eric Blake
Post by Blue Swirl
Post by Blue Swirl
Let's have files A, B, C etc. with backing files AA etc. How would
libvirt know that when QEMU wants to write to file CA, this is because
it's needed to access C, or is it just trickery by a devious guest to
corrupt storage?
The fix for CVE-2010-2238 already deals with this: if primary image C refers
to backing file CA of raw format, but does not state what file format CA
contains, then a malicious guest can modify the contents of CA to appear to
be yet another qcow2 image.  At which point, if libvirt follows the backing
file specified in CA, then yes, the malicious guest really can cause libvirt
to expose arbitrary file CB for manipulation by the guest.  But that
security hole was already plugged - by default, libvirt refuses to probe
backing files parsed from qcow2 headers for file format, but instead
requires the outer qcow2 header to also include the a file format
designation for the backing file.  At which point, you then have a safe
chain: if C refers to CA, then libvirt knows that both C and CA are
essential to the storage presented by giving qemu the file name C, and the
guest will already be modifying CA, but there is no storage corruption
involved.
But what if CA is accessed even if C is not? For example, QEMU opens C
(to determine CA and write new information about the path), closes it
and then requests CA?
Why is qemu trying to access CA?
Either because CA was mentioned as a backing file for C (in which case
libvirt already knows about it, because either libvirt handed C to qemu at
startup time after already parsing C's headers to learn that CA is a backing
file, or because libvirt called the snapshot_blkdev command with the intent
of having qemu populate CA with C as its backing file), or because qemu has
a bug (in which case, libvirt should refuse the access to CA).
So no new backing files can be introduced by QEMU after it has started
without libvirt knowing it?
Post by Eric Blake
Libvirt is already perfectly capable of tracking all files that qemu might
need to access, and whether it is qemu or libvirt that does the open() of
those files, we can still have libvirt validate whether each request for a
file makes sense given the context of all previous files in use from the
time the qemu command line was invoked and across all monitor commands in
the meantime.
On non-NFS solutions, where every file can have a SELinux label, then the
security is then present by merely having libvirt relabel all such files to
a unique label for that particular qemu process, and SELinux merely enforces
that qemu cannot open() anything but what libvirt has already labeled.  And
since libvirt already knows which files to label in the non-NFS scenario, it
already knows which fds to pass in the NFS scenario, at which point the
ability to prevent qemu from open()ing an NFS file is a security
enhancement.
Your question about qemu wanting to use CA is thus answered independently of
whether the fd management solution is solved by libvirt handing an fd for CA
to qemu prior to any monitor command where qemu will then need to use CA, or
whether qemu is taught to asynchronously ask libvirt to open an fd for CA on
qemu's behalf.  The answer is that libvirt already tracks whether qemu
should access CA, and just needs a way to enforce that knowledge.  The
enforcement already exists for non-NFS via SELinux labels, and the proposal
to add fd handling will expand that enforcement to also cover NFS.
OK. I think fds would be useful internally in a privilege separation
mode in plain QEMU too.
Eric Blake
2011-07-20 20:10:57 UTC
Permalink
Post by Blue Swirl
Post by Eric Blake
Either because CA was mentioned as a backing file for C (in which case
libvirt already knows about it, because either libvirt handed C to qemu at
startup time after already parsing C's headers to learn that CA is a backing
file, or because libvirt called the snapshot_blkdev command with the intent
of having qemu populate CA with C as its backing file), or because qemu has
a bug (in which case, libvirt should refuse the access to CA).
So no new backing files can be introduced by QEMU after it has started
without libvirt knowing it?
No, you missed my point. A new backing file can only be introduced by
qemu after it has started by libvirt using a finite set of monitor
commands. These include disk hotplug (libvirt adds to the list of files
known to be accessed by qemu, by reading the image headers of the new
disk to be hot-plugged prior to issuing the monitor command), by disk
hot-unplug (libvirt revokes the access to the files making up that disk,
which it remembers from before the disk was added), and snapshot_blkdev
(libvirt is explicitly requesting a new qcow2 file with the old file as
the backing image, so it knows the new relationship of files to be
accessed by that block device). Since libvirt issued the monitor
commands, libvirt always knows which files qemu should be trying to
access when servicing those block devices to the guest.
Post by Blue Swirl
OK. I think fds would be useful internally in a privilege separation
mode in plain QEMU too.
Yes, there's more than one reason to add fd support to all possible
situations where qemu is currently resorting to open().
--
Eric Blake ***@redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
Kevin Wolf
2011-07-21 08:25:36 UTC
Permalink
Post by Blue Swirl
Post by Kevin Wolf
Post by Jes Sorensen
Post by Kevin Wolf
Post by Jes Sorensen
Right, we're stuck with the two horros of NFS and selinux, so we need
something that gets around the problem. In a sane world we would simply
say 'no NFS, no selinux', but as you say that will never happen.
My suggestion of a callback mechanism where libvirt registers the
callback with QEMU for open() calls, allowing libvirt to perform the
open and return the open file descriptor would get around this problem.
To me this sounds more like a problem than a solution. It basically
means that during an open (which may even be initiated by a monitor
command), you need monitor interaction. It basically means that open
becomes asynchronous, and requires clients to deal with that, which
sounds at least "interesting"... Also you have to add some magic to all
places opening something.
I think if libvirt wants qemu to use an fd instead of a file name, it
shouldn't pass a file name but an fd in the first place. Which means
that the two that we need are support for an fd: protocol (patches on
the list, need review), and a way for libvirt to override the backing
file of an image.
The problem is that QEMU will find backing file file names inside the
images which it will be unable to open. How do you suggest we get around
that?
This is the part with allowing libvirt to override the backing file. Of
course, this is not something that we can add with five lines of code,
it requires -blockdev.
Let's have files A, B, C etc. with backing files AA etc. How would
libvirt know that when QEMU wants to write to file CA, this is because
it's needed to access C, or is it just trickery by a devious guest to
corrupt storage?
This could be handled so that instead of naming the backing file, QEMU
asks for a descriptor for the backing file by presenting the
descriptor to main file C,
qemu shouldn't ask for anything. libvirt shouldn't give it a filename in
the first place. It should do something like this:

{ "execute": "blockdev_add", "arguments"= {
"driver": "fd," "fd": "4", "backing-file": {
"driver": "fd," "fd": "5"
}
}}

And then qemu doesn't even have a reason to know that there is something
called CA.
Post by Blue Swirl
but I think the real solution is that
libvirt should handle the storage formats completely and it should
present QEMU with only a raw file like interface (read/write/seek) for
the data. Then any backing files would be handled within libvirt.
Performance could suffer, though.
I like your humour. :-)

Kevin
Blue Swirl
2011-07-21 19:01:03 UTC
Permalink
Post by Kevin Wolf
Post by Blue Swirl
Post by Kevin Wolf
Post by Jes Sorensen
Post by Kevin Wolf
Post by Jes Sorensen
Right, we're stuck with the two horros of NFS and selinux, so we need
something that gets around the problem. In a sane world we would simply
say 'no NFS, no selinux', but as you say that will never happen.
My suggestion of a callback mechanism where libvirt registers the
callback with QEMU for open() calls, allowing libvirt to perform the
open and return the open file descriptor would get around this problem.
To me this sounds more like a problem than a solution. It basically
means that during an open (which may even be initiated by a monitor
command), you need monitor interaction. It basically means that open
becomes asynchronous, and requires clients to deal with that, which
sounds at least "interesting"... Also you have to add some magic to all
places opening something.
I think if libvirt wants qemu to use an fd instead of a file name, it
shouldn't pass a file name but an fd in the first place. Which means
that the two that we need are support for an fd: protocol (patches on
the list, need review), and a way for libvirt to override the backing
file of an image.
The problem is that QEMU will find backing file file names inside the
images which it will be unable to open. How do you suggest we get around
that?
This is the part with allowing libvirt to override the backing file. Of
course, this is not something that we can add with five lines of code,
it requires -blockdev.
Let's have files A, B, C etc. with backing files AA etc. How would
libvirt know that when QEMU wants to write to file CA, this is because
it's needed to access C, or is it just trickery by a devious guest to
corrupt storage?
This could be handled so that instead of naming the backing file, QEMU
asks for a descriptor for the backing file by presenting the
descriptor to main file C,
qemu shouldn't ask for anything. libvirt shouldn't give it a filename in
{ "execute": "blockdev_add", "arguments"= {
 "driver": "fd," "fd": "4", "backing-file": {
   "driver": "fd," "fd": "5"
 }
}}
And then qemu doesn't even have a reason to know that there is something
called CA.
Yes, that's better.
Post by Kevin Wolf
Post by Blue Swirl
but I think the real solution is that
libvirt should handle the storage formats completely and it should
present QEMU with only a raw file like interface (read/write/seek) for
the data. Then any backing files would be handled within libvirt.
Performance could suffer, though.
I like your humour. :-)
Well, for some applications, security is more important than
performance or convenience.
Avi Kivity
2011-07-22 07:36:51 UTC
Permalink
Post by Kevin Wolf
Post by Jes Sorensen
The problem is that QEMU will find backing file file names inside the
images which it will be unable to open. How do you suggest we get around
that?
This is the part with allowing libvirt to override the backing file. Of
course, this is not something that we can add with five lines of code,
it requires -blockdev.
It can be done without blockdev. Have a dictionary that translates
filenames, and populate it from the command line (for a bonus, translate
a filename to a file descriptor inherited from the caller or passed via
the monitor).
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
Kevin Wolf
2011-07-22 08:11:10 UTC
Permalink
Post by Avi Kivity
Post by Kevin Wolf
Post by Jes Sorensen
The problem is that QEMU will find backing file file names inside the
images which it will be unable to open. How do you suggest we get around
that?
This is the part with allowing libvirt to override the backing file. Of
course, this is not something that we can add with five lines of code,
it requires -blockdev.
It can be done without blockdev. Have a dictionary that translates
filenames, and populate it from the command line (for a bonus, translate
a filename to a file descriptor inherited from the caller or passed via
the monitor).
Sure, you can always add ugly hacks, but it isn't the right solution
that we want to use for all times. However, once we use it, it will show
up in the external API and we'll never get rid of it again.

Kevin
Blue Swirl
2011-07-22 16:09:35 UTC
Permalink
Post by Kevin Wolf
Post by Kevin Wolf
 The problem is that QEMU will find backing file file names inside the
 images which it will be unable to open. How do you suggest we get around
 that?
This is the part with allowing libvirt to override the backing file. Of
course, this is not something that we can add with five lines of code,
it requires -blockdev.
It can be done without blockdev.  Have a dictionary that translates
filenames, and populate it from the command line (for a bonus, translate
a filename to a file descriptor inherited from the caller or passed via
the monitor).
Sure, you can always add ugly hacks, but it isn't the right solution
that we want to use for all times. However, once we use it, it will show
up in the external API and we'll never get rid of it again.
Fully agree. This would also be a highly specific API for QCOW2 and
similar formats.

Markus Armbruster
2011-07-21 13:07:09 UTC
Permalink
Post by Jes Sorensen
Post by Anthony Liguori
Post by Jes Sorensen
Post by Eric Blake
As nice as that sentiment is, it will never fly, because it would be a
regression in current behavior. The whole reason that the virt_use_nfs
SELinux bool exists is that some people are willing to make the partial
security tradeoff. Besides, the use of sVirt via SELinux is more than
just open() protection - while the current virt_use_nfs bool makes NFS
less secure than otherwise possible, it still gives some nice guarantees
to the rest of the qemu process such as passthrough accesses to local
pci devices.
Well leaving things at status quo is not making it worse, it just leaves
an evil in place.
NFS and SELinux is a fundamental problem with SELinux and NFS. We can
piss and moan as much as we want about it but it's reality. SELinux
fundamentally requires extended attributes. By the time NFS adds
extended attribute support, we'll all be flying around in hover cars.
As terrible as NFS is, people use it all of the time.
It would be nice if libvirt had the ability to make better use of DAC to
support isolation. The fact that MAC is the only way you can do
isolation between guests is pretty unfortunate. If I could assign
specific UIDs to a guest and use that to enforce isolation, it would go
a long ways to solving this problem.
Right, we're stuck with the two horros of NFS and selinux, so we need
something that gets around the problem. In a sane world we would simply
say 'no NFS, no selinux', but as you say that will never happen.
My suggestion of a callback mechanism where libvirt registers the
callback with QEMU for open() calls, allowing libvirt to perform the
open and return the open file descriptor would get around this problem.
No go, I'm afraid.

The problem at hand is how to confine the untrusted QEMU process so it
can access exactly the resources the guest configuration specifies, no
more, no less.

When guest configuration says "use image A", it really means "file A
plus certain other resources that are required to use it". For obvious
usability reasons, we don't require spelling out "certain other" if we
can safely get the information from A.

Example: file foo.qcow2 requires its backing file file foo.img.

Obviously, the code to get information from A must be trusted.
Therefore, it can't run in the untrusted QEMU process.

Example: the proposed callback mechanism.

Guest configuration says "use image foo.qcow2". Libvirt (or
whatever management layer) arranges that the QEMU process can use
file foo.qcow2.

The QEMU process then uses the callback to gain access to the
backing file. Nothing stops it to ask for whatever file it wants.
How can libvirt decide whether the file is one of the resources the
guest configuration specifies?

The only way I can see around having libvirt read resource information
from images (preferably via a suitable library provided by QEMU) is to
require the administrator to spell out the resouce information.
Administrators generally don't fancy spelling out things the computer
already knows.
Jes Sorensen
2011-07-21 14:15:36 UTC
Permalink
Post by Markus Armbruster
Post by Jes Sorensen
Right, we're stuck with the two horros of NFS and selinux, so we need
something that gets around the problem. In a sane world we would simply
say 'no NFS, no selinux', but as you say that will never happen.
My suggestion of a callback mechanism where libvirt registers the
callback with QEMU for open() calls, allowing libvirt to perform the
open and return the open file descriptor would get around this problem.
No go, I'm afraid.
The problem at hand is how to confine the untrusted QEMU process so it
can access exactly the resources the guest configuration specifies, no
more, no less.
When guest configuration says "use image A", it really means "file A
plus certain other resources that are required to use it". For obvious
usability reasons, we don't require spelling out "certain other" if we
can safely get the information from A.
Example: file foo.qcow2 requires its backing file file foo.img.
Obviously, the code to get information from A must be trusted.
Therefore, it can't run in the untrusted QEMU process.
Example: the proposed callback mechanism.
Guest configuration says "use image foo.qcow2". Libvirt (or
whatever management layer) arranges that the QEMU process can use
file foo.qcow2.
The QEMU process then uses the callback to gain access to the
backing file. Nothing stops it to ask for whatever file it wants.
How can libvirt decide whether the file is one of the resources the
guest configuration specifies?
The only way I can see around having libvirt read resource information
from images (preferably via a suitable library provided by QEMU) is to
require the administrator to spell out the resouce information.
Administrators generally don't fancy spelling out things the computer
already knows.
As I pointed out in other parts of the discussion, libvirt must carry a
complete list of authorized files/devices that a guest is allowed to
access. There is no way around this anyway, and since this is the case,
it's not a showstopper for using the callback mechanism.

Cheers,
Jes
Daniel P. Berrange
2011-07-19 16:47:55 UTC
Permalink
Post by Jes Sorensen
Post by Eric Blake
[adding the libvir-list]
Urgh, libvirt parsing image files is really unfortunate, it really
doesn't give me warm fuzzy feelings :( libvirt really should not know
about internals of image formats.
But even if you add new features to qemu to avoid needing this in the
future, it doesn't change the past - libvirt will always have to know
how to parse image files understood by older qemu, and so as long as
libvirt already knows how to do that parsing, we might as well take
advantage of it.
What has been done here in the past is plain wrong. Continuing to do it
isn't the right thing to do here.
Post by Eric Blake
Besides, I feel that having a well-documented file format, so that
independent applications can both parse the same file with the same
semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not
get added and libvirt will never be able to keep up. The driver for a
file format belongs in QEMU and nowhere else.
This would be possible if QEMU to provide a libblockformat.so library
which allowed apps to extract metadata from file formats using a stable
API.

Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Jes Sorensen
2011-07-20 08:26:49 UTC
Permalink
Post by Daniel P. Berrange
Post by Jes Sorensen
Post by Eric Blake
Besides, I feel that having a well-documented file format, so that
independent applications can both parse the same file with the same
semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not
get added and libvirt will never be able to keep up. The driver for a
file format belongs in QEMU and nowhere else.
This would be possible if QEMU to provide a libblockformat.so library
which allowed apps to extract metadata from file formats using a stable
API.
There is no reason for libvirt or any external process to mess about
with the internals of image files. You have the same problem if the
image file is encrypted.

Jes
Daniel P. Berrange
2011-07-20 09:38:06 UTC
Permalink
Post by Jes Sorensen
Post by Daniel P. Berrange
Post by Jes Sorensen
Post by Eric Blake
Besides, I feel that having a well-documented file format, so that
independent applications can both parse the same file with the same
semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not
get added and libvirt will never be able to keep up. The driver for a
file format belongs in QEMU and nowhere else.
This would be possible if QEMU to provide a libblockformat.so library
which allowed apps to extract metadata from file formats using a stable
API.
There is no reason for libvirt or any external process to mess about
with the internals of image files. You have the same problem if the
image file is encrypted.
Just repeating "libvirt doesn't need todo this" many times doesn't make
it true. I have described why we need to read the disk image to determine
its backing files ahead of QEMU being launched quite clearly.

Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Jes Sorensen
2011-07-21 08:09:04 UTC
Permalink
Post by Daniel P. Berrange
Post by Jes Sorensen
Post by Daniel P. Berrange
Post by Jes Sorensen
Post by Eric Blake
Besides, I feel that having a well-documented file format, so that
independent applications can both parse the same file with the same
semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not
get added and libvirt will never be able to keep up. The driver for a
file format belongs in QEMU and nowhere else.
This would be possible if QEMU to provide a libblockformat.so library
which allowed apps to extract metadata from file formats using a stable
API.
There is no reason for libvirt or any external process to mess about
with the internals of image files. You have the same problem if the
image file is encrypted.
Just repeating "libvirt doesn't need todo this" many times doesn't make
it true. I have described why we need to read the disk image to determine
its backing files ahead of QEMU being launched quite clearly.
I have pointed out repeatedly why you do not need to do this. It is
horrendous that libvirt didn't seek a proper solution to this problem
before going on and implementing this current mess.

Jes
Anthony Liguori
2011-07-20 14:35:35 UTC
Permalink
Post by Daniel P. Berrange
This would be possible if QEMU to provide a libblockformat.so library
which allowed apps to extract metadata from file formats using a stable
API.
I'm in 100% agreement that we need to provide the equivalent of a
libblockformat.so down the road.

But the block layer needs some work before we can support a stable
interface.

Regards,

Anthony Liguori
Post by Daniel P. Berrange
Daniel
Michael Roth
2011-07-21 18:56:08 UTC
Permalink
Post by Daniel P. Berrange
Post by Jes Sorensen
Post by Eric Blake
[adding the libvir-list]
Urgh, libvirt parsing image files is really unfortunate, it really
doesn't give me warm fuzzy feelings :( libvirt really should not know
about internals of image formats.
But even if you add new features to qemu to avoid needing this in the
future, it doesn't change the past - libvirt will always have to know
how to parse image files understood by older qemu, and so as long as
libvirt already knows how to do that parsing, we might as well take
advantage of it.
What has been done here in the past is plain wrong. Continuing to do it
isn't the right thing to do here.
Post by Eric Blake
Besides, I feel that having a well-documented file format, so that
independent applications can both parse the same file with the same
semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not
get added and libvirt will never be able to keep up. The driver for a
file format belongs in QEMU and nowhere else.
This would be possible if QEMU to provide a libblockformat.so library
which allowed apps to extract metadata from file formats using a stable
API.
How wrong would it be to call out to qemu-img to handle this instead?
Seems like a more stable interface (assuming the output of `qemu-img
info` is treated as an API of sorts, or perhaps some other output mode
is added to qemu-img that is considered stable).
Post by Daniel P. Berrange
Daniel
Continue reading on narkive:
Loading...