[Swan] XFRM pCPU Load distribution in KVM Muti-queue virtio-net
paul at nohats.ca
Mon Sep 21 21:55:10 UTC 2020
On Mon, 21 Sep 2020, Rav Ya wrote:
> I have been referring to this page (https://libreswan.org/wiki/XFRM_pCPU) and it doesn't say that XFRM is only supported for ikev2. I
> am setting up a shared VTI for 500 Remote Clients IPSec (xAUTH using PAM, IKEv1) tunnels. I have attached my ipsec.conf at the
> bottom of this email.
The goal of pCPU is to use more than 1 CPU for a single IPsec SA. If you
have 500 clients you have 500 IPsec SA's, which get roughly load
balanced over your CPUs already. It should not help your case.
> What I understand from your response: Please correct me
> 1. Lbreswan experimental versions only support pCPU with IKEv2. (Lod balancing one big IPSec flow over multiple vCPUs.)
> Question: For my use case (500 Clients, xAUTH using PAM, IKEv1 ) the SAs per client will be created per vCPU.
> * The vCPU will be picked randomly (How will the 500 SAs be distributed?) 500/6 = 82 SAs per CPU.
> * There shall be no duplicate SAs for a single connection over multiple vCPU because there is no pCPU XFRM. Correct?
> * Is there a way fro me to check how any SAs got allocated to a vCPU on my system?
I don't know the answers for these.
> My Observation: When I start pushing traffic across all the 500 SAs
> * Some times the load isn't distributed evenly and I see some vCPUs geting overutilized and start slowing down the Libreswan packet
> processing rate.
Most CPU should be going into IPsec packets inside the kernel, not IKE
packets inside libreswan.
> * The Libreswan server itn't able to process packets fast enough and the TAP interface (tx queue) on the KVM virtulization host
> starts dropping packets.
Clarify "dropping packets". If it is not IKE packets, than libreswan is
not involved. It is the kernel.
> Currently, my ipsec clients are using: ( Any advice?) vCPU is Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz passthrough Host VM
3des is 8x more cpu intensive compared to aes. Use ike=aes-sha1-modp1536
modp1024 is too weak and recent libreswan has removed support for this.
If your clients support it, use esp=aes_gcm256. It is much faster than
It seems you problem might be more related to libreswan speed
optimializations with NSS in the last few versions. Are you at
least running 3.32 ?
You can benchmark the libreswan cpu usage using:
sudo ipsec whack --debug cpu-usage
Note that switching your clients to IKEv2 will also greatly improve your
- Less (encrypted) IKE packets to setup conneections
- less retransmits because initiator is responsible in IKEv2
(in IKEv1, both ends retransmit)
certificate handling has also greatly improved in 3.32 leading to a 5 to
10 times better performance due to certificate caching and expiration,
making the nss lookups faster. it also caches the (encrypted) private
key which is useful for busy servers.
I don't think pCPU is a fix for what your problem is really is. Upgrade
to latest libreswan on the server, and if possible switch to IKEv2.
More information about the Swan