[Swan] Frequent Crashing with libreswan 3.10/pluto

Paul paul at nohats.ca
Sun Oct 12 17:58:00 EEST 2014


Is this using psk or certs? With certificates you should not run into this as the uniqueids should cause the old connection to get replaced. 

Hopefully these kind of switching bugs will vanish with ios8 that sets up two vpn tunnels, one for wifi and one for 4g

Sent from my iPhone

> On Oct 12, 2014, at 01:42, Reuben Farrelly <reuben-libreswan at reub.net> wrote:
> 
> Just following up on this thread that I created from earlier in the week...
> 
>> On 7/10/2014 12:07 PM, Paul Wouters wrote:
>>> On Tue, 7 Oct 2014, Reuben Farrelly wrote:
>>> Frequently the libreswan end seems to just die.  Pluto crashes out
>>> entirely and the VPN goes down.
>>> 
>>> At the time of this the following is logged in the kernel log:
>>> 
>>> Oct  6 14:52:06 lightning kernel: pluto[23223]: segfault at 58 ip
>>> 00007f8f85f0c8d0 sp 00007fffb30275b8 error 4 in
>>> libnss3.so[7f8f85ebc000+11f000]
>>> 
>>> And in the auth.log the lines preceding this are:
>>> 
>>> Oct  6 14:50:12 lightning pluto[23223]: | V2 microcode entry (R2:
>>> process INFORMATIONAL) has unspecified timeout_event
>> 
>> It would be useful to have the full debug log for that with more
>> history.
> 
> I ended up enabling lots more logging and upgrading to -git soon after, and didn't have this problem reoccur after that, so it's probable that this problem will be fixed in the next release.
> 
> I have however had one other problem show up.  Yesterday I also changed over the connection from a static public IP address to another telco provider with a dynamic (private) IP which is NATted to the Internet. Gotta love Carrier Grade NAT :-)
> 
> Periodically if the 4G link goes down and back up again and/or the connection is seen from another public IP and/or the spoke router is restarted, the IPsec refuses to complete re-negotiation.  This is logged in the pluto log:
> 
> | hmac_update data value:
> |
> | hmac_update data value:
> |   04
> | hmac_update: inside if
> | hmac_update: after digest
> | hmac_update: after assert
> | prf+[4]:  dc ad f9 1e  6f 6c cf 47  b6 82 a9 07  ea 52 c5 fd
> | prf+[4]:  2a 64 19 dc
> | our  keymat  c3 e3 1d 09  97 fa 5c b7  cc d1 06 ea  77 f9 68 fa
> | our  keymat  51 62 91 34  2e 89 6e c1  0b 2c dd 55  26 07 67 7f
> | our  keymat  d2 36 6a 07
> | peer keymat  7c 69 f8 6b  22 18 43 a7  93 e8 af 2e  26 6b df d2
> | peer keymat  d7 c7 04 f2  50 96 79 f3  dc ad f9 1e  6f 6c cf 47
> | peer keymat  b6 82 a9 07
> | install_ipsec_sa() for #541: inbound and outbound
> | route owner of "reub.net"[32] 120.158.138.197 unrouted: "reub.net"[30] 110.151.236.171 prospective erouted; eroute owner: "reub.net"[30] 110.151.236.171 prospective erouted
> | could_route called for reub.net (kind=CK_INSTANCE)
> "reub.net"[32] 120.158.138.197 #540: cannot route -- route already in use for "reub.net"
> | ikev2_child_sa_respond returned STF_FATAL
> | ikev2_parent_inI2outR2_tail returned STF_FATAL
> | complete v2 state transition with STF_FATAL
> | deleting event for #541
> | deleting state #541
> | deleting event for #541
> | ICOOKIE:  b4 66 69 60  d5 a6 39 d4
> | RCOOKIE:  53 4e 42 cc  3a c2 17 cc
> | state hash entry 15
> | * processed 1 messages from cryptographic helpers
> | next event EVENT_v2_RETRANSMIT in 21 seconds for #510
> | next event EVENT_v2_RETRANSMIT in 21 seconds for #510
> 
> The system will stay in this sort of stuck state for some time, continually trying to re-establish but failing to complete.  If I restart the ipsec tools it then successfully connects.
> 
> The routing table on the remote libreswan box shows a route for 192.168.6.2 still present even after the external IP has changed (both before and after the connection re-establishes).  192.168.6.1 is the libreswan/hub end and 192.168.6.2 is the Cisco IOS client end.
> 
> The ipsec status output on the libreswan box also shows lots of log entries:
> 
> 000 #667: "reub.net"[32] 120.158.138.197:4500 STATE_PARENT_R2 (received v2I2, PARENT SA established); EVENT_SA_REPLACE in 58s; isakmp#0; idle; import:respond to stranger
> 000 #667: "reub.net"[32] 120.158.138.197 ref=0 refhim=0 Traffic:
> 000 #659: "reub.net"[32] 120.158.138.197:4500 STATE_PARENT_R2 (received v2I2, PARENT SA established); EVENT_SA_EXPIRE in 207s; isakmp#0; idle; import:respond to stranger
> 000 #659: "reub.net"[32] 120.158.138.197 ref=0 refhim=0 Traffic:
> 000 #647: "reub.net"[32] 120.158.138.197:4500 STATE_PARENT_R2 (received v2I2, PARENT SA established); EVENT_SA_EXPIRE in 56s; isakmp#0; idle; import:respond to stranger
> 000 #647: "reub.net"[32] 120.158.138.197 ref=0 refhim=0 Traffic:
> 
> I would have thought DPD would have taken care of this by aging out the old (now defunct) connection which has presumably not seen or heard any more matching packets after the event, but it doesn't seem to do that.
> 
> I can reproduce this on demand.  Currently my pluto.log is 93M so I can post more output on either side of this event if required...
> 
> Reuben
> 
> 


More information about the Swan mailing list