[Swan-dev] ERROR: netlink response for Add SA ... included errno 3: No such process
paul at nohats.ca
Sat Apr 11 20:07:24 EEST 2015
On Sat, 11 Apr 2015, Herbert Xu wrote:
>>> This is wrong because some updates do not
>>> contain keying material.
>> I don't understand this. Can you explain what the problem is for those
>> SA's ?
> Updates are used in two places in pluto. They're used for inbound
> SAs as part of the get_spi + update procedure, and they are used
> for NAT-T updates. In the latter case there is no keying material
> so you must not replace the update with an add.
> The kernel will never delete any live SAs installed by pluto since
> pluto does not set hard life times on them. So the NAT-T update
> should never fail anyway unless some third party is deleting SAs.
So the patch that switched it between add and update got quite a history
behind it. And a reverted revert commit.
Part of the problem is https://bugs.libreswan.org/show_bug.cgi?id=75
Some history can be seen in the git commits:
So first, the b5 commit changed from add to update:
errors on roadwarriors switching between internal IP's and reconnecting,
where NETKEY says a policy already exists (possibly because we do not
properly delete the policy when we delete the phase1, and the XP clients
delete their phase1 after 1 minute of idle time)
I reverted that, but sadly I didn't log why.
It was then reverted again by my with a comment:
* NEW will fail when an existing policy, UPD always works.
* This seems to happen in cases with NAT'ed XP clients, or
* quick recycling/resurfacing of roadwarriors on the same IP.
* req.n.nlmsg_type = XFRM_MSG_NEWPOLICY;
So that does relate to your NAT update comment.
But note that Tuomo also ran into a problem with connecting tunnels as
explained in bug 75:
On configuration where a talks with c via b eg. a == b == c where tunnels are
defined as a-c on both a = b and b = c we are missing tunnels.
This is bug introduced by commit:
With that patch applied I get this error when starting ipsec:
#31: ERROR: netlink XFRM_MSG_NEWPOLICY response for flow
tun.10000 at 18.104.22.168 included errno 17: File exists
With patch reverted there are no errors and tunnels work as they should.
> When you do a get_spi the kernel generates a temporary SA to keep
> hold of the SPI so that nobody else gets it. But this SA only
> lives until xfrm_acq_expires.
Oh, I did not realise that! That's good to know.
> Therefore redoing the add after update might work but is simply
> wrong. You might as well just pluck some random number out of
> thin air and use that as your SPI.
I understand now. I guess we need to look into the two tunnel problem
listed above and how to deal with the Win XP / NAT issue, and figure
out what is going wrong there.
>> Yes, current git has switched to libevent and subsecond retransmits
>> and timeouts, so we will fall within that 30 second time window as
> OK if you can guarantee that you will not call update 30 seconds
> after the get_spi, then you should be fine. In that case you can
> also revert the patch that retries the add after update because
> it is just papering over the xfrm_acq_expires problem and is no
> longer needed.
Right. I'll do that.
>>> For libreswan, I suggest that you increase this parameter to
>>> a more appropriate value. I haven't done the calculations but
>>> strongswan sets it to 165 which seems to be appropriate.
>> Almost 3 minutes? That seems very long.
> Well it just has to be longer than the maximum interval between
> pluto doing get_spi and calling update_sa.
Maybe pluto should explictly track this timer and just fail when it
notices the time has expired.
More information about the Swan-dev