[Swan] /proc/sys/net/core/xfrm_larval_drop value
Paul Wouters
paul at nohats.ca
Fri Oct 11 21:06:12 EEST 2013
On Fri, 11 Oct 2013, Tuomo Soini wrote:
(adding swan list to discuss this issue)
>> echo 0 > /proc/sys/net/core/xfrm_larval_drop
>
> We should include this info to sysctl.conf sample we have...
>
> net.core.xfrm_larval_drop = 0
I'm not sure....
To recap for everyone, there is a problem when the larval value is set
to 1, which is the modern default on kernels. You will see this issue
when restarting libreswan:
ERROR: Module xfrm4_mode_transport is in use
ERROR: Module esp4 is in use
The problem is that it can take up to 10 seconds for packets to arrive.
This appears to be a problem in auto=route connections, those that are
triggered on-demand.
The kernel XFRM maintainer/dev told us:
As far as I can see we still don't have any code for proper queueing on
larval SAs so auto=route will drop packets until the SAs are in place.
Now sometimes if the triggering even happened to be in a sleepable
context everything will appear to work, but this is certainly not always
going to be the case and may change from kernel version to kernel
version on a whim.
It's happening is that your first TCP SYN packet triggers the IPsec
lookup, however, the packet itself is dropped. TCP then retransmits but
it only gets through after the IPsec SAs are fully instated, resulting
in the delay.
What happens in some kernels is that the IPsec trigger occurs in a
sleepable context, which means that the sending process will wait for
the IPsec SAs to be installed before sending the first SYN. However,
this was never meant to be a complete solution to supporting auto=route
as it relies on the fact that there must be some sleepable context prior
to the SYN packet being sent.
Evidently this is no longer the case
I will take this issue to the IPsec maintainer and the network
maintainer to see if we can make adjustments to allow at least the TCP
connection case to work with auto=route. However, there is no guarantee
that this will be done as we may not be able to insert the requisite
sleepable context into the general network stack just so that IPsec
auto=route can work.
Longer term for auto=route to be properly supported someone needs to
implement packet queueing on larval SAs.
Now here is the problem. There is a workaround:
echo 0 > /proc/sys/net/core/xfrm_larval_drop
This will ensure that the first retransmit of the SYN packet (+1s) should make
it through, however it also causes connect() to return immediately on a
non-blocking socket with an error.
So either way, badness is happening. Just a different kind of badness.
Which one do we prefer?
The obvious solution is to have first+last packet caching with
XFRM/NETKEY as we do in KLIPS. But we've been waiting on that for
almost 10 years already.....
Paul
More information about the Swan
mailing list