[Swan] /proc/sys/net/core/xfrm_larval_drop value

Paul Wouters paul at nohats.ca
Fri Oct 11 21:06:12 EEST 2013


On Fri, 11 Oct 2013, Tuomo Soini wrote:

(adding swan list to discuss this issue)

>> echo 0 > /proc/sys/net/core/xfrm_larval_drop
>
> We should include this info to sysctl.conf sample we have...
>
> net.core.xfrm_larval_drop = 0

I'm not sure....

To recap for everyone, there is a problem when the larval value is set
to 1, which is the modern default on kernels. You will see this issue
when restarting libreswan:

ERROR: Module xfrm4_mode_transport is in use
ERROR: Module esp4 is in use

The problem is that it can take up to 10 seconds for packets to arrive.
This appears to be a problem in auto=route connections, those that are
triggered on-demand.

The kernel XFRM maintainer/dev told us:

 	As far as I can see we still don't have any code for proper queueing on
 	larval SAs so auto=route will drop packets until the SAs are in place.
 	Now sometimes if the triggering even happened to be in a sleepable
 	context everything will appear to work, but this is certainly not always
 	going to be the case and may change from kernel version to kernel
 	version on a whim.

 	It's happening is that your first TCP SYN packet triggers the IPsec
 	lookup, however, the packet itself is dropped.  TCP then retransmits but
 	it only gets through after the IPsec SAs are fully instated, resulting
 	in the delay.

 	What happens in some kernels is that the IPsec trigger occurs in a
 	sleepable context, which means that the sending process will wait for
 	the IPsec SAs to be installed before sending the first SYN.  However,
 	this was never meant to be a complete solution to supporting auto=route
 	as it relies on the fact that there must be some sleepable context prior
 	to the SYN packet being sent.

 	Evidently this is no longer the case

 	I will take this issue to the IPsec maintainer and the network
 	maintainer to see if we can make adjustments to allow at least the TCP
 	connection case to work with auto=route.  However, there is no guarantee
 	that this will be done as we may not be able to insert the requisite
 	sleepable context into the general network stack just so that IPsec
 	auto=route can work.

 	Longer term for auto=route to be properly supported someone needs to
 	implement packet queueing on larval SAs.

Now here is the problem. There is a workaround:

         echo 0 > /proc/sys/net/core/xfrm_larval_drop

This will ensure that the first retransmit of the SYN packet (+1s) should make
it through, however it also causes connect() to return immediately on a
non-blocking socket with an error.

So either way, badness is happening. Just a different kind of badness.
Which one do we prefer?

The obvious solution is to have first+last packet caching with
XFRM/NETKEY as we do in KLIPS. But we've been waiting on that for
almost 10 years already.....

Paul


More information about the Swan mailing list