Discussion:
[libvirt] RFC: put domain's interfaces into distinct namespaces
Nikolay Shirokovskiy
2018-11-07 08:48:16 UTC
Permalink
Hi, all!

There is performance issue with network filters and broadcast ethernet traffic.
If L2 segment is large enough (several thousands of VMs) then there is a lot of
broadcast ARP traffic (about frames 100/s). As aresult on host with several hundreds
VMs (say 300) we have kernel thread eating 100% of CPUs just for checking this traffic
against firewall rules. The problem is if there are rules in ebtables POSTROUTING chain
(clean-traffic is example of such filter) then when every single broadcast frame turns into
300, one for every distinct bridge port and then each one of these 300 is checked against
300 / 2 rules average to find chain for that port. As a result we have 100 * 300 * 300 / 2
= 4.5 * 10^6 rules checks per second. Kernel does not spread this workload onto
different CPUs and anyway this is wasting CPUs!

The simple solution is to put rules that ACCEPT ARP traffic into POSTROUTING
itself before any port specific chains. But this will affect non-VM ports too
and host itself. So can we instead make a distinct network namespace for every
VM and put tap there, next add the bridge into the namespace too so we can apply
ebtables rules there and insert tap into the bridge. Finally connect the bridges
in root namespace and VM namespace by veth pair. As result in the situation
described above each cloned frame will be cheched only againt rules for this
very VM. The regular TCP traffic will have same benefits. On the other hand we
need a bridge and veth pair for every VM and some CPU power to process this extra
traffic path.

The proposed approach also fixes the problem of slow libvirtd restarting with
network filters ([1], [2]) as it is rather difficult to mess network rules in
different network namespace, at least restarting/reloading firewalld won't
hurt such rules so we just don't need to reinstantiate rules at all.



[1] [RFC] Faster libvirtd restart with nwfilter rules
https://www.redhat.com/archives/libvir-list/2018-September/msg01206.html
which continues in
https://www.redhat.com/archives/libvir-list/2018-October/msg00657.html

[2] [PATCH v2 0/2] nwfilter: don't reinstantiate rules if there is no need to
https://www.redhat.com/archives/libvir-list/2018-October/msg01317.html


Nikolay
Nikolay Shirokovskiy
2018-11-16 07:35:09 UTC
Permalink
ping
Post by Nikolay Shirokovskiy
Hi, all!
There is performance issue with network filters and broadcast ethernet traffic.
If L2 segment is large enough (several thousands of VMs) then there is a lot of
broadcast ARP traffic (about frames 100/s). As aresult on host with several hundreds
VMs (say 300) we have kernel thread eating 100% of CPUs just for checking this traffic
against firewall rules. The problem is if there are rules in ebtables POSTROUTING chain
(clean-traffic is example of such filter) then when every single broadcast frame turns into
300, one for every distinct bridge port and then each one of these 300 is checked against
300 / 2 rules average to find chain for that port. As a result we have 100 * 300 * 300 / 2
= 4.5 * 10^6 rules checks per second. Kernel does not spread this workload onto
different CPUs and anyway this is wasting CPUs!
The simple solution is to put rules that ACCEPT ARP traffic into POSTROUTING
itself before any port specific chains. But this will affect non-VM ports too
and host itself. So can we instead make a distinct network namespace for every
VM and put tap there, next add the bridge into the namespace too so we can apply
ebtables rules there and insert tap into the bridge. Finally connect the bridges
in root namespace and VM namespace by veth pair. As result in the situation
described above each cloned frame will be cheched only againt rules for this
very VM. The regular TCP traffic will have same benefits. On the other hand we
need a bridge and veth pair for every VM and some CPU power to process this extra
traffic path.
The proposed approach also fixes the problem of slow libvirtd restarting with
network filters ([1], [2]) as it is rather difficult to mess network rules in
different network namespace, at least restarting/reloading firewalld won't
hurt such rules so we just don't need to reinstantiate rules at all.
[1] [RFC] Faster libvirtd restart with nwfilter rules
https://www.redhat.com/archives/libvir-list/2018-September/msg01206.html
which continues in
https://www.redhat.com/archives/libvir-list/2018-October/msg00657.html
[2] [PATCH v2 0/2] nwfilter: don't reinstantiate rules if there is no need to
https://www.redhat.com/archives/libvir-list/2018-October/msg01317.html
Nikolay
--
libvir-list mailing list
https://www.redhat.com/mailman/listinfo/libvir-list
Nikolay Shirokovskiy
2018-11-16 08:49:44 UTC
Permalink
Post by Nikolay Shirokovskiy
Hi, all!
There is performance issue with network filters and broadcast ethernet traffic.
If L2 segment is large enough (several thousands of VMs) then there is a lot of
broadcast ARP traffic (about frames 100/s). As aresult on host with several hundreds
VMs (say 300) we have kernel thread eating 100% of CPUs just for checking this traffic
against firewall rules. The problem is if there are rules in ebtables POSTROUTING chain
(clean-traffic is example of such filter) then when every single broadcast frame turns into
300, one for every distinct bridge port and then each one of these 300 is checked against
300 / 2 rules average to find chain for that port. As a result we have 100 * 300 * 300 / 2
= 4.5 * 10^6 rules checks per second. Kernel does not spread this workload onto
different CPUs and anyway this is wasting CPUs!
The simple solution is to put rules that ACCEPT ARP traffic into POSTROUTING
itself before any port specific chains. But this will affect non-VM ports too
and host itself. So can we instead make a distinct network namespace for every
VM and put tap there, next add the bridge into the namespace too so we can apply
ebtables rules there and insert tap into the bridge. Finally connect the bridges
in root namespace and VM namespace by veth pair. As result in the situation
described above each cloned frame will be cheched only againt rules for this
very VM. The regular TCP traffic will have same benefits. On the other hand we
need a bridge and veth pair for every VM and some CPU power to process this extra
traffic path.
The proposed approach also fixes the problem of slow libvirtd restarting with
network filters ([1], [2]) as it is rather difficult to mess network rules in
different network namespace, at least restarting/reloading firewalld won't
hurt such rules so we just don't need to reinstantiate rules at all.
Not true. I forgot sometimes libvirt updates can bring updated filters definitions
and/or daemon code that instantiates them so we still need reinstantiate on
restart generally. In this case we can use approach of [2] or [3] which one
is prefearable.

[3] [PATCH] nwfilter: intantiate filters on loading driver
https://www.redhat.com/archives/libvir-list/2018-October/msg00787.html

Nikolay
Post by Nikolay Shirokovskiy
[1] [RFC] Faster libvirtd restart with nwfilter rules
https://www.redhat.com/archives/libvir-list/2018-September/msg01206.html
which continues in
https://www.redhat.com/archives/libvir-list/2018-October/msg00657.html
[2] [PATCH v2 0/2] nwfilter: don't reinstantiate rules if there is no need to
https://www.redhat.com/archives/libvir-list/2018-October/msg01317.html
Nikolay
--
libvir-list mailing list
https://www.redhat.com/mailman/listinfo/libvir-list
Daniel P. Berrangé
2018-11-19 16:39:41 UTC
Permalink
Post by Nikolay Shirokovskiy
Hi, all!
There is performance issue with network filters and broadcast ethernet traffic.
If L2 segment is large enough (several thousands of VMs) then there is a lot of
broadcast ARP traffic (about frames 100/s). As aresult on host with several hundreds
VMs (say 300) we have kernel thread eating 100% of CPUs just for checking this traffic
against firewall rules. The problem is if there are rules in ebtables POSTROUTING chain
(clean-traffic is example of such filter) then when every single broadcast frame turns into
300, one for every distinct bridge port and then each one of these 300 is checked against
300 / 2 rules average to find chain for that port. As a result we have 100 * 300 * 300 / 2
= 4.5 * 10^6 rules checks per second. Kernel does not spread this workload onto
different CPUs and anyway this is wasting CPUs!
Yes, this is a key limitation of the traditional ebtables/ip[6]tables commands.
There's no efficient way to associate rules with specific devices.

This is apparently solved with nftables if you setup your chains to match on
the 'netdev' family.
Post by Nikolay Shirokovskiy
The simple solution is to put rules that ACCEPT ARP traffic into POSTROUTING
itself before any port specific chains. But this will affect non-VM ports too
and host itself. So can we instead make a distinct network namespace for every
VM and put tap there, next add the bridge into the namespace too so we can apply
ebtables rules there and insert tap into the bridge. Finally connect the bridges
in root namespace and VM namespace by veth pair. As result in the situation
described above each cloned frame will be cheched only againt rules for this
very VM. The regular TCP traffic will have same benefits. On the other hand we
need a bridge and veth pair for every VM and some CPU power to process this extra
traffic path.
Yeah, I don't really like the idea of introducing extra devices into
the I/O path for every NIC, as it will burn extra CPU and introduce
latency.


I don't really have a particular suggestion for fixing the perf problem
offhand, other than my note about nftables supposedly allowing us to fix
this problem. RHEL-8 & Fedora 30 will both be nftables based, so it is
imminently available as a solution for libvirt, assuming it does in fact
let us solve the perf problem.

The hard thing is that we'll need some significant work in the nwfilter
driver to port it to native nft commands - just using the legacy iptbles
compat tools uses nft, but not a way that would let us get the perf
benefit IIUC.

Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Continue reading on narkive:
Loading...