[Swan] Cisco IOS XE Interoperability with Libreswan

Reuben Farrelly reuben-libreswan at reub.net
Sun Aug 19 07:32:11 UTC 2018



On 19/08/2018 10:05 am, Paul Wouters wrote:
> On Sat, 18 Aug 2018, Reuben Farrelly wrote:
> 
>>>           mark=-1/0xffffffff
>>>           vti-interface=vti-1
>>>           leftvti=192.168.6.1/30
> 
>> Ok I've worked out the cause of this.  The problem is the 'mark' value 
>> that I have configured.
> 
>> Up to and including version 3.23 this worked (or at least it didn't 
>> break anything).
>>
>>>           mark=-1/0xffffffff
>>
>> In version 3.25 the use of -1 here seems to have broken things.
>>
>> After setting the mark (statically) to '1' instead of '-1' I have 
>> connectivity again across the IPSec tunnel.
>> I think that's a bug either in the man page, or in the code ;-)
> 
> Odd, because the last change to mark was in 3.23:
> 
> * XFRM: Fix unique marks accidentally setting -1 instead of random [Paul]
> 
> Before 3.23, we did not properly set unique marks.
> 
> So my guess is, we are setting marks properly now, but in 3.25 we no
> longer try to delete the vti interface, and so when you re-start
> libreswan there might still be an old vti01 device left with the old
> mark, and the new mark is another unique mark. And by setting a manual
> mark, you caused the old and new values always to be the same.
> 
> We will have to add refcounting for vti interface usage, so we can
> properly delete the VTI interface when the last tunnel using it is
> brought down.
> 
> But also, unique marks was really meant for roadwarriors, not single
> static conns :)
> 
> Thanks for the investigations and feedback! And I'm still a little
> confused about some of the (improper?) cisco behaviour.
> 
> Paul

Ok - that all makes sense now.  I haven't been restarting the system 
each time I make a change or rebuild/upgrade so it's entirely likely 
that that is what is going on.
Maybe you could add a note to the Route Based VPN configuration page 
that indicates that dynamic VTIs are more for roadwarriors, as I wasn't 
aware of what value to choose when I set mine up according to the howto :-)

I have however gotten to the bottom of the connectivity problem, and had 
an IPSec session up and running for 16 hours so far - so I think I've 
got it sorted.

The issue isn't Libreswan, in fact it's not even an IPsec related 
problem.  It appears to be dynamic NAT on the router that is originating 
the connection which is breaking the initial IPSec negotiation process.

A bit of background: I have had two dynamic interface NATs configured:

- One on the outbound Cellular interface for normal outbound Internet access
  ("ip nat inside source route-map 4G-nat-access interface Cellular0/2/0 
overload")

- another on the Tunnel VTI so that traffic across the VPN is NATted to 
the VTI interface locally.  This saves me adding routes on the remote 
end and makes the router configuration 100x less complicated.
("ip nat inside source route-map lightning-nat-access interface Tunnel1 
overload")

The 'match' in each route map was just a check to determine if the 
traffic was going out of an interface (match interface Cel0/2/0 etc)

This has worked perfectly on classic IOS.  Traffic is NATted to either 
interfaces depending on where it is routed/destined to, either Cel0/2/0 
direct or VTI Tunnel1 if VPN.

But on IOS XE, it appears that the NAT implementation is very different 
under the hood to classic IOS, and despite the commands being identical 
the behaviour is not.  What appears to be happening is that even though 
the NAT I have is only dynamic outbound to the interface, the router is 
capturing the return IPSec traffic from the head end that is destined 
-to- the router and eating it (probably determining that there is no NAT 
state for it, so dropping it rather than passing it to the OS to process).

I suspect this because if I turn off all NAT then the IPSec negotiation 
completes successfully, every time.  Turn it back on and the negotiation 
gets stuck and we see the retransmit behaviour happening all over again.

As matching the outbound interface as a NAT selector is not an option 
any more, I'm currently trying to come up with some NAT match ACLs to 
work around this behaviour, some quick and dirty ACLs with very tight 
src/dst entries has done the trick for now - but it's ugly.

Either way I am satisfied that Libreswan is all good and it's a router 
related problem.

I will chase up that problem with ECP_256 breaking the IKEv2 
negotiation, but that is the only issue and I have a workaround for it 
anyway.

Thanks for your help debugging this Paul.  Learnt a lot but the final 
solution is pretty good even if it is just for home.

Thanks,
Reuben



More information about the Swan mailing list