[Swan] Problem with random rekey failures

Miguel Ponce Antolin mponce at paradigmadigital.com
Fri Jun 11 12:46:12 UTC 2021


Hi everyone,

I have been suffering a random problem with libreswan v3.25 when connecting
an AWS EC2 Instance running Libreswan and a Cisco ASA on the other end.

The phase 1 ISAKMP is renegotiated and successfully reestablished while is
associated with a concrete rightsubnet, specifically with the last one
vpn/1x18. We have configured 18 rightsubnets.

The problem comes when the phase 2 is renewed. Sometimes, in a complete
random way, AWS EC2 Libreswan side cannot restart rightsubnets with a
connection event (ping, netcat, telnet). Let me explain more details:
- First, in my ipsec.d config it is configured ikelifetime=28800s and
salifetime=28800s, but the phase 2 of every connection goes down after 30
minutes without traffic flow.
- When the problem is active. If any phase 2 is down I can reconnect it,
ALWAYS, from the Cisco ASA side to the AWS EC2 side by sending some
traffic, but *it is not possible to reconnect any subnet from AWS to the
Cisco ASA side*.
- The only solution to this problem is to stop and restart the IPsec
service. After the restart when a connection is Down we can reestablish it
by sending some traffic from both sides.
- For some reason this state happens randomly, we have been testing a long
time this problem and we can be working without this issue for 10 days. But
since last Wednesday it is happening, at least, once a day.


*Troubleshooting done:*

- Checked firewall on both sides

- Iptables is disabled an systemd masked on the AWS EC2 Libreswan side

- Selinux is disabled on the AWS EC2 Libreswan side

- Subnets configuration are the same in the same order on both sides

- Routes on both sides are checked, actually they are working well when the
problem is not active.

- Sysctl.conf:

net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.eth0.rp_filter = 0
net.ipv4.conf.ip_vti0.rp_filter = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.eth0.send_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.eth0.accept_redirects = 0
net.ipv4.tcp_app_win=1380
net.ipv4.ip_forward=1

- These are the* IPsec configuration files* (the "subnet.n" string is
replacing the subnet numbers, but they are correctly set):

*/etc/ipsec.conf*

# /etc/ipsec.conf - Libreswan IPsec configuration file

config setup

plutodebug="all crypt"
plutostderrlog=/var/log/libreswan.log
#
virtual_private=%v4:10.subnet.1.0/22,%v4:10.subnet.2.0/20,%v4:10.subnet.3.128/25,%v4:10.subnet.4.74/32,%v4:10.subnet.5.75/32,%v4:10..subnet.6.224/27,%v4:10.subnet.7.0/24,%v4:10.subnet.8.200/31,%v4:10.subnet.9.166/32,%v4:10.subnet.10.0/16,%v4:11.subnet.11.0/24,%v4:10.subnet.12.0/24,%v4:10.subnet.13.16/28,%v4:10.subnet.14.16/28,%v4:10.subnet.15.128/26,%v4:10.subnet.16.17/32,%v4:10.subnet.17.0/24,%v4:10.subnet.18.9/32

*/etc/ipsec.d/vpn.conf*

conn vpn
    type=tunnel
    authby=secret
    auto=start
    left=%defaultroute
    leftid=xxx.xxx.xxx.120
    leftsubnets=xxx.xxx.xxx.80/28
    right=xxx.xxx.xxx.45
    rightid=xxx.xxx.xxx.45
    rightsubnets={10.subnet.1.0/22 10.subnet.2.0/20 10.subnet.3.128/25
10.subnet.4.74/32 10.subnet.5.75/32 10.subnet.6.224/27 10.subnet.7.0/24
10.subnet.8.200/31 10.subnet.9.166/32 10.subnet.10.0/16 10.subnet.11.0/24
10.subnet.12.16/28 10.subnet.13.16/28 10.subnet.14.128/26
10.subnet.15.17/32 10.subnet.16.0/24 10.subnet.17.0/24 10.subnet.18.9/32}
    leftsourceip=xxx.xxx.xxx.92
    ikev2=insist
    ike=aes256-sha2;dh14
    esp=aes256-sha256
    keyexchange=ike
    ikelifetime=28800s
    salifetime=28800s
    dpddelay=30
    dpdtimeout=120
    dpdaction=restart
    encapsulation=no


We are testing libreswan in the staging environment but we want to promote
it to the production environment.

We tried to "force" to reconnect using the ping command to an IP in various
rightsubnets, *you can find attached the log of 2 minutes of reconnection
attempts* of vpn/1x11 (ping after 11:16:29), vpn/1x12 (ping after
11:16:53), vpn/1x13 (ping after 11:17:07), vpn/1x15 (ping after 11:17:19),
vpn/1x16 (ping after 11:17:31).

Could you please help us with some possible cause we are missing here?
Is there any troubleshooting we could do in order to know where the rekey
request is lost or why is not trying to rekey at all when this problem is
active?

Thanks in advance,

Best regards!

-- 

[image: Logo Especialidad]

*Miguel Ponce Antolín.*
Sistemas    ·    +34 670 360 655
[image: Linea]
[image: Logo Paradigma]   ·   paradig.ma <https://www.paradigmadigital.com/>
·   contáctanos <https://www.paradigmadigital.com/contacto>   ·   [image:
Twitter] <https://twitter.com/paradigmate>  [image: Youtube]
<https://www.youtube.com/user/ParadigmaTe?feature=watch>  [image: Linkedin]
<https://www.linkedin.com/company/paradigma-digital/>  [image: Instagram]
<https://www.instagram.com/paradigma_digital/?hl=es>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.libreswan.org/pipermail/swan/attachments/20210611/40f2cb90/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reconnect_attempts-20210611.log
Type: text/x-log
Size: 2388093 bytes
Desc: not available
URL: <https://lists.libreswan.org/pipermail/swan/attachments/20210611/40f2cb90/attachment-0001.bin>


More information about the Swan mailing list