[Swan] Frequent Crashing with libreswan 3.10/pluto

Reuben Farrelly reuben-libreswan at reub.net
Sun Oct 12 08:42:44 EEST 2014


Just following up on this thread that I created from earlier in the week...

On 7/10/2014 12:07 PM, Paul Wouters wrote:
> On Tue, 7 Oct 2014, Reuben Farrelly wrote:
>> Frequently the libreswan end seems to just die.  Pluto crashes out
>> entirely and the VPN goes down.
>>
>> At the time of this the following is logged in the kernel log:
>>
>> Oct  6 14:52:06 lightning kernel: pluto[23223]: segfault at 58 ip
>> 00007f8f85f0c8d0 sp 00007fffb30275b8 error 4 in
>> libnss3.so[7f8f85ebc000+11f000]
>>
>> And in the auth.log the lines preceding this are:
>>
>> Oct  6 14:50:12 lightning pluto[23223]: | V2 microcode entry (R2:
>> process INFORMATIONAL) has unspecified timeout_event
>
> It would be useful to have the full debug log for that with more
> history.

I ended up enabling lots more logging and upgrading to -git soon after, 
and didn't have this problem reoccur after that, so it's probable that 
this problem will be fixed in the next release.

I have however had one other problem show up.  Yesterday I also changed 
over the connection from a static public IP address to another telco 
provider with a dynamic (private) IP which is NATted to the Internet. 
Gotta love Carrier Grade NAT :-)

Periodically if the 4G link goes down and back up again and/or the 
connection is seen from another public IP and/or the spoke router is 
restarted, the IPsec refuses to complete re-negotiation.  This is logged 
in the pluto log:

| hmac_update data value:
|
| hmac_update data value:
|   04
| hmac_update: inside if
| hmac_update: after digest
| hmac_update: after assert
| prf+[4]:  dc ad f9 1e  6f 6c cf 47  b6 82 a9 07  ea 52 c5 fd
| prf+[4]:  2a 64 19 dc
| our  keymat  c3 e3 1d 09  97 fa 5c b7  cc d1 06 ea  77 f9 68 fa
| our  keymat  51 62 91 34  2e 89 6e c1  0b 2c dd 55  26 07 67 7f
| our  keymat  d2 36 6a 07
| peer keymat  7c 69 f8 6b  22 18 43 a7  93 e8 af 2e  26 6b df d2
| peer keymat  d7 c7 04 f2  50 96 79 f3  dc ad f9 1e  6f 6c cf 47
| peer keymat  b6 82 a9 07
| install_ipsec_sa() for #541: inbound and outbound
| route owner of "reub.net"[32] 120.158.138.197 unrouted: "reub.net"[30] 
110.151.236.171 prospective erouted; eroute owner: "reub.net"[30] 
110.151.236.171 prospective erouted
| could_route called for reub.net (kind=CK_INSTANCE)
"reub.net"[32] 120.158.138.197 #540: cannot route -- route already in 
use for "reub.net"
| ikev2_child_sa_respond returned STF_FATAL
| ikev2_parent_inI2outR2_tail returned STF_FATAL
| complete v2 state transition with STF_FATAL
| deleting event for #541
| deleting state #541
| deleting event for #541
| ICOOKIE:  b4 66 69 60  d5 a6 39 d4
| RCOOKIE:  53 4e 42 cc  3a c2 17 cc
| state hash entry 15
| * processed 1 messages from cryptographic helpers
| next event EVENT_v2_RETRANSMIT in 21 seconds for #510
| next event EVENT_v2_RETRANSMIT in 21 seconds for #510

The system will stay in this sort of stuck state for some time, 
continually trying to re-establish but failing to complete.  If I 
restart the ipsec tools it then successfully connects.

The routing table on the remote libreswan box shows a route for 
192.168.6.2 still present even after the external IP has changed (both 
before and after the connection re-establishes).  192.168.6.1 is the 
libreswan/hub end and 192.168.6.2 is the Cisco IOS client end.

The ipsec status output on the libreswan box also shows lots of log entries:

000 #667: "reub.net"[32] 120.158.138.197:4500 STATE_PARENT_R2 (received 
v2I2, PARENT SA established); EVENT_SA_REPLACE in 58s; isakmp#0; idle; 
import:respond to stranger
000 #667: "reub.net"[32] 120.158.138.197 ref=0 refhim=0 Traffic:
000 #659: "reub.net"[32] 120.158.138.197:4500 STATE_PARENT_R2 (received 
v2I2, PARENT SA established); EVENT_SA_EXPIRE in 207s; isakmp#0; idle; 
import:respond to stranger
000 #659: "reub.net"[32] 120.158.138.197 ref=0 refhim=0 Traffic:
000 #647: "reub.net"[32] 120.158.138.197:4500 STATE_PARENT_R2 (received 
v2I2, PARENT SA established); EVENT_SA_EXPIRE in 56s; isakmp#0; idle; 
import:respond to stranger
000 #647: "reub.net"[32] 120.158.138.197 ref=0 refhim=0 Traffic:

I would have thought DPD would have taken care of this by aging out the 
old (now defunct) connection which has presumably not seen or heard any 
more matching packets after the event, but it doesn't seem to do that.

I can reproduce this on demand.  Currently my pluto.log is 93M so I can 
post more output on either side of this event if required...

Reuben





More information about the Swan mailing list