[Swan] dead peer deduction not working

Dave Houser davehouser1 at gmail.com
Wed Oct 20 16:57:01 UTC 2021


@Paul Wouters <paul at nohats.ca> Thanks for the details as always. I think I
should share more of my story of what we are trying to accomplish as it may
help with finding the correct solution.
Our topology is the following:

| Juniper router 01 |  -----to-vsrx-01-----  | Host01 | ------
to-vsrx-02 ----- | Juniper router 02 |

- Anything within "||" is a host
- Anything between "--" is a connection name

"Host01" is where Libreswan is running.
The host needs to connect to two separate Juniper routers on two
separate ipsec tunnels.
"to-vsrx-01" is connection 01.
"to-vsrx-02" is connection 02.

Here is the current state of the to-vsrx-01 config

conn to-vsrx-01b
    auto=start
    keyexchange=ike
    authby=secret
    ike=aes256-sha2_256;dh20
    esp=aes256-sha2_256
    left=2.2.0.2
    leftid=2.2.0.2
    leftupdown=/opt/_updown_vti01
    right=3.3.0.2
    leftsubnet=172.21.0.0/29
    rightsubnet=0.0.0.0/0
    dpddelay=5s
    dpdtimeout=30s
    dpdaction=hold
    ikelifetime=24h
    salifetime=24h

The purpose of the two connections is for redundancy / failover. If one
router goes down, Host01 must have an active ipsec connection.
Our WANT is to make sure to-vsrx-01 and to-vsrx-02 connections have the
same "rightsubnet" and are active at the same time. However, as we found,
this is not possible in the latest release of libreswan. This was found
from another email string I started with you, believe you are still working
on fixing this as I have not heard back.
Not sure if Libreswan has this capability, but it would be great if a peer
is down, Liberswan can recognize if a peer is dead, then shift traffic to
the other connection. I don't believe this is possible. Therefore, as a
workaround I scripted a way of doing this outside of Libreswan. However I
rather not use this if the functionality is built into Libreswan.

The problem at hand, and why this email string was opened, is the Juniper
Router admins need to be able to perform these commands at times. These
commands perform a manual rekey of each IKE Phase.

Phase1 = clear security ike security-associations
Phase2 = clear security ipsec security-associations

First of all, automatic rekeying works just fine when the Juniper ike / sa
lifetime and Libreswan ikelifetime or salifetime. Rekeying happens, the
tunnel stays up. So I know rekeying is not a problem as far as I can tell.
When testing manually, running "clear security ike security-associations"
on the Juniper, works fine as well, and IKE P1 rekeys manually when
performed. The tunnel stays up.
When testing manually, running "clear security ipsec security-associations"
on the Juniper, the tunnel immediately stops and is torn down on the
Juniper and Libreswan side.
- Juniper shows no connection when checking ipsec SA status.
- Libreswan `ipsec whack --trafficstatus` shows no connection.
- Libreswan pluto.log shows two entries that appear after running the above
command:

Oct 20 17:49:50.846500: "to-vsrx-01" #35: ESP traffic information: in=0B
out=0B
Oct 20 17:49:50.938362: "to-vsrx-0b" #33: established IKE SA

Since the rekey-timers are not synced in anyway some times I will see these
messages on the pluto.log

Oct 20 18:36:22.399410: "to-vsrx-01" #75: initiating rekey to replace IKE
SA #73
Oct 20 18:36:22.405764: "to-vsrx-01" #75: sent CREATE_CHILD_SA request to
rekey IKE SA
Oct 20 18:36:22.425346: "to-vsrx-01" #75: rekeyed #73 STATE_V2_REKEY_IKE_I1
and expire it remaining life 343.067679s
Oct 20 18:36:22.425640: "to-vsrx-01" #75: established IKE SA
{cipher=AES_CBC_256 integ=HMAC_SHA2_256_128 prf=HMAC_SHA2_256 group=DH20}
Oct 20 18:36:23.425747: "to-vsrx-01" #73: deleting state
(STATE_V2_ESTABLISHED_IKE_SA) aged 58.066437s and sending notification
Oct 20 18:36:23.428217: packet from 3.3.0.2:500: INFORMATIONAL response has
no corresponding IKE SA; message dropped

Then nothing happens for a period of time, then the connection will
re-establish. I am not sure what mechanism re-establishes this connection
but I think it's based on re-keying on the Juniper end as a rekey from
Libreswan looks like it failed above.

So I tried changing the following settings to narrow down which system is
performing the rekey:
- Libreswan, I set the following
ikelifetime=400s
salifetime=500s
- Juniper I set the following:
set security ike proposal ike-prop-01 lifetime-seconds 180
set security ipsec proposal ike-prop-01 lifetime-seconds 300

I performed the test again. After 300s, the connection was rebuilt. And the
output from above will appear in the pluto.log
I noticed that this line changed to "180s" in the pluto.log:

Oct 20 18:46:48.704224: "to-vsrx-01" #81: deleting state
(STATE_V2_ESTABLISHED_IKE_SA) aged 180.038216s and NOT sending notification

Which I believe is based on the change I made on the Juniper Ike P1 rekey
time. However that time is not necessarily accurate to how much time
actually lapsed. As the connection was re-established after ~120s. But
after testing again and again with the same timer settings, the actual time
the connection tries to re-establish just seems to be random. Sometimes
it's quick, and sometimes its longer, ~5 min.  Be aware these two systems
are in an enclosed env, and do not have access to NTP, not sure if that has
anything to do with it?

So from what I understand, the connection will only be rebuilt based on if
a re-keying happens. What I don't get is why the tunnel goes down in the
first place, this is just a rekey of the SA being done manually from the
Juniper. But the automatic rekey of the SA works fine. Is there some way to
make Libreswan try to re-establish the connection, or recognize the
connection went down outside of rekeying? I rather not have to set
rekeys to such low values on both systems to keep the connection up.
The only workaround I have thought of so far is to integrate some logic in
my script that keeps the tunnels up to try to reconnect if the tunnel goes
down outside of libreswan. But I want to make sure there isn't some other
way to do this functionality within Libreswan.

I apologize if you and Andrew have been beating a dead horse with me, and I
do not mean to make you all repeat yourselfs, I just want to make sure I am
clear in defining my problem in hopes that we can find a solution that
fits.

Lastly, here is the status of the tunnel after the connection fails. Let me
know what you all think.

# ipsec auto status
ipsec auto: warning: obsolete command syntax used
000 using kernel interface: xfrm
000
000 interface lo UDP [::1]:500
000 interface lo UDP 127.0.0.1:4500
000 interface lo UDP 127.0.0.1:500
000 interface ens3 UDP 192.168.0.14:4500
000 interface ens3 UDP 192.168.0.14:500
000 interface ens4 UDP 2.2.0.2:4500
000 interface ens4 UDP 2.2.0.2:500
000 interface virbr0 UDP 192.168.122.1:4500
000 interface virbr0 UDP 192.168.122.1:500
000
000 fips mode=disabled;
000 SElinux=disabled
000 seccomp=disabled
000
000 config setup options:
000
000 configdir=/etc, configfile=/etc/ipsec.conf, secrets=/etc/ipsec.secrets,
ipsecdir=/etc/ipsec.d
000 nssdir=/etc/ipsec.d, dumpdir=/run/pluto, statsbin=unset
000 dnssec-rootkey-file=/var/lib/unbound/root.key, dnssec-trusted=<unset>
000 sbindir=/usr/sbin, libexecdir=/usr/libexec/ipsec
000 pluto_version=v4.5-216-g66c256610c-main,
pluto_vendorid=OE-Libreswan-v4.5-216, audit-log=yes
000 nhelpers=-1, uniqueids=yes, dnssec-enable=yes, logappend=yes,
logip=yes, shuntlifetime=900s, xfrmlifetime=30s
000 ddos-cookies-threshold=25000, ddos-max-halfopen=50000, ddos-mode=auto,
ikev1-policy=accept
000 ikebuf=0, msg_errqueue=yes, crl-strict=no, crlcheckinterval=0,
listen=<any>, nflog-all=0
000 ocsp-enable=no, ocsp-strict=no, ocsp-timeout=2, ocsp-uri=<unset>
000 ocsp-trust-name=<unset>
000 ocsp-cache-size=1000, ocsp-cache-min-age=3600,
ocsp-cache-max-age=86400, ocsp-method=get
000 global-redirect=no, global-redirect-to=<unset>
000 secctx-attr-type=32001
000 debug:
000
000 nat-traversal=yes, keep-alive=20, nat-ikeport=4500
000 virtual-private (%priv):
000 - allowed subnets: 192.168.0.0/16, 172.16.0.0/12, 25.0.0.0/8,
100.64.0.0/10, fd00::/8, fe80::/10, <unset-subnet>
000
000 Kernel algorithms supported:
000
000 algorithm ESP encrypt: name=3DES_CBC, keysizemin=192, keysizemax=192
000 algorithm ESP encrypt: name=AES_CBC, keysizemin=128, keysizemax=256
000 algorithm ESP encrypt: name=AES_CCM_12, keysizemin=128, keysizemax=256
000 algorithm ESP encrypt: name=AES_CCM_16, keysizemin=128, keysizemax=256
000 algorithm ESP encrypt: name=AES_CCM_8, keysizemin=128, keysizemax=256
000 algorithm ESP encrypt: name=AES_CTR, keysizemin=128, keysizemax=256
000 algorithm ESP encrypt: name=AES_GCM_12, keysizemin=128, keysizemax=256
000 algorithm ESP encrypt: name=AES_GCM_16, keysizemin=128, keysizemax=256
000 algorithm ESP encrypt: name=AES_GCM_8, keysizemin=128, keysizemax=256
000 algorithm ESP encrypt: name=CAMELLIA_CBC, keysizemin=128, keysizemax=256
000 algorithm ESP encrypt: name=CHACHA20_POLY1305, keysizemin=256,
keysizemax=256
000 algorithm ESP encrypt: name=NULL, keysizemin=0, keysizemax=0
000 algorithm ESP encrypt: name=NULL_AUTH_AES_GMAC, keysizemin=128,
keysizemax=256
000 algorithm AH/ESP auth: name=AES_CMAC_96, key-length=128
000 algorithm AH/ESP auth: name=AES_XCBC_96, key-length=128
000 algorithm AH/ESP auth: name=HMAC_MD5_96, key-length=128
000 algorithm AH/ESP auth: name=HMAC_SHA1_96, key-length=160
000 algorithm AH/ESP auth: name=HMAC_SHA2_256_128, key-length=256
000 algorithm AH/ESP auth: name=HMAC_SHA2_256_TRUNCBUG, key-length=256
000 algorithm AH/ESP auth: name=HMAC_SHA2_384_192, key-length=384
000 algorithm AH/ESP auth: name=HMAC_SHA2_512_256, key-length=512
000 algorithm AH/ESP auth: name=NONE, key-length=0
000
000 IKE algorithms supported:
000
000 algorithm IKE encrypt: v1id=5, v1name=OAKLEY_3DES_CBC, v2id=3,
v2name=3DES, blocksize=8, keydeflen=192
000 algorithm IKE encrypt: v1id=8, v1name=OAKLEY_CAMELLIA_CBC, v2id=23,
v2name=CAMELLIA_CBC, blocksize=16, keydeflen=128
000 algorithm IKE encrypt: v1id=-1, v1name=n/a, v2id=20, v2name=AES_GCM_C,
blocksize=16, keydeflen=128
000 algorithm IKE encrypt: v1id=-1, v1name=n/a, v2id=19, v2name=AES_GCM_B,
blocksize=16, keydeflen=128
000 algorithm IKE encrypt: v1id=-1, v1name=n/a, v2id=18, v2name=AES_GCM_A,
blocksize=16, keydeflen=128
000 algorithm IKE encrypt: v1id=13, v1name=OAKLEY_AES_CTR, v2id=13,
v2name=AES_CTR, blocksize=16, keydeflen=128
000 algorithm IKE encrypt: v1id=7, v1name=OAKLEY_AES_CBC, v2id=12,
v2name=AES_CBC, blocksize=16, keydeflen=128
000 algorithm IKE encrypt: v1id=-1, v1name=n/a, v2id=28,
v2name=CHACHA20_POLY1305, blocksize=16, keydeflen=256
000 algorithm IKE PRF: name=HMAC_MD5, hashlen=16
000 algorithm IKE PRF: name=HMAC_SHA1, hashlen=20
000 algorithm IKE PRF: name=HMAC_SHA2_256, hashlen=32
000 algorithm IKE PRF: name=HMAC_SHA2_384, hashlen=48
000 algorithm IKE PRF: name=HMAC_SHA2_512, hashlen=64
000 algorithm IKE PRF: name=AES_XCBC, hashlen=16
000 algorithm IKE DH Key Exchange: name=MODP1536, bits=1536
000 algorithm IKE DH Key Exchange: name=MODP2048, bits=2048
000 algorithm IKE DH Key Exchange: name=MODP3072, bits=3072
000 algorithm IKE DH Key Exchange: name=MODP4096, bits=4096
000 algorithm IKE DH Key Exchange: name=MODP6144, bits=6144
000 algorithm IKE DH Key Exchange: name=MODP8192, bits=8192
000 algorithm IKE DH Key Exchange: name=DH19, bits=512
000 algorithm IKE DH Key Exchange: name=DH20, bits=768
000 algorithm IKE DH Key Exchange: name=DH21, bits=1056
000 algorithm IKE DH Key Exchange: name=DH31, bits=256
000
000 stats db_ops: {curr_cnt, total_cnt, maxsz} :context={0,0,0}
trans={0,0,0} attrs={0,0,0}
000
000 Connection list:
000
000 "to-vsrx-01b": 172.21.0.0/29===2.2.0.2<2.2.0.2>...3.3.0.2<3.3.0.2>===
0.0.0.0/0; prospective erouted; eroute owner: #0
000 "to-vsrx-01b":     oriented; my_ip=unset; their_ip=unset;
my_updown=/opt/_updown_vti01;
000 "to-vsrx-01b":   xauth us:none, xauth them:none,  my_username=[any];
their_username=[any]
000 "to-vsrx-01b":   our auth:secret, their auth:secret, our autheap:none,
their autheap:none;
000 "to-vsrx-01b":   modecfg info: us:none, them:none, modecfg policy:push,
dns:unset, domains:unset, cat:unset;
000 "to-vsrx-01b":   sec_label:unset;
000 "to-vsrx-01b":   ike_life: 86400s; ipsec_life: 86400s; replay_window:
32; rekey_margin: 540s; rekey_fuzz: 100%; keyingtries: 0;
000 "to-vsrx-01b":   retransmit-interval: 500ms; retransmit-timeout: 60s;
iketcp:no; iketcp-port:4500;
000 "to-vsrx-01b":   initial-contact:no; cisco-unity:no;
fake-strongswan:no; send-vendorid:no; send-no-esp-tfc:no;
000 "to-vsrx-01b":   policy:
IKEv2+PSK+ENCRYPT+TUNNEL+PFS+UP+IKE_FRAG_ALLOW+ESN_NO;
000 "to-vsrx-01b":   v2-auth-hash-policy: none;
000 "to-vsrx-01b":   conn_prio: 29,0; interface: ens4; metric: 0; mtu:
unset; sa_prio:auto; sa_tfc:none;
000 "to-vsrx-01b":   nflog-group: unset; mark: unset; vti-iface:unset;
vti-routing:no; vti-shared:no; nic-offload:auto;
000 "to-vsrx-01b":   our idtype: ID_IPV4_ADDR; our id=2.2.0.2; their
idtype: ID_IPV4_ADDR; their id=3.3.0.2
000 "to-vsrx-01b":   dpd: action:hold; delay:5; timeout:30; nat-t:
encaps:auto; nat_keepalive:yes; ikev1_natt:both
000 "to-vsrx-01b":   newest ISAKMP SA: #33; newest IPsec SA: #0; conn
serial: $4;
000 "to-vsrx-01b":   IKE algorithms: AES_CBC_256-HMAC_SHA2_256-DH20
000 "to-vsrx-01b":   IKEv2 algorithm newest: AES_CBC_256-HMAC_SHA2_256-DH20
000 "to-vsrx-01b":   ESP algorithms: AES_CBC_256-HMAC_SHA2_256_128
000
000 Total IPsec connections: loaded 1, active 0
000
000 State Information: DDoS cookies not required, Accepting new IKE
connections
000 IKE SAs: total(1), half-open(0), open(0), authenticated(1), anonymous(0)
000 IPsec SAs: total(0), authenticated(0), anonymous(0)
000
000 #33: "to-vsrx-01b":500 STATE_V2_ESTABLISHED_IKE_SA (established IKE
SA); REKEY in 85040s; newest ISAKMP; idle;
000
000 Bare Shunt list:
000



On Mon, Oct 18, 2021 at 10:36 PM Paul Wouters <paul at nohats.ca> wrote:

> On Sat, 16 Oct 2021, Dave Houser wrote:
>
> > Therefore I configured the following will need to wait until Monday to
> test again:
> >
> >     dpddelay=5
> >     dpdtimeout=30
> >     dpdaction=restart
>
> I recommend dpdaction=hold and let the revive/restart mechanism handle
> the restart.
>
> > >in IKEv2 it is a retransmit timeout that will trigger a restart.
> >
> > By retransmit timeout, do you mean rekeying? or do you mean the
> dpdtimeout? If neither, can you clarify what you mean by retransmit
> > timeout?
>
> In IKEv1, IKE messages are sent with a random MSGID. If you sent one IKE
> request and you don't get a reply, you can still send another request.
> For example, you could send a Quick Mode request, hear no reply, and
> decide to send a DPD request. So the way DPD is implemented is that if
> it does not get a reply within "retransmit-interval", it resends the
> existing packet and doubles the interval. (eg 500ms, 1s, 2s). When it
> hits the "retransmit-timeout", the exchange is deemed lost and no more
> packet retransmits happen. But the daemon has DPD timer, and it will
> send a _new_ DPD request after "dpddelay" time. if that packet also gets
> no answer, at the "dpdtimeout" time, the entire IKE SA and and IPsec
> SA's are torn down as the peer is deemed lost.
>
> With IKEv2, IKE messages are sent with a sequential MSGID. You cannot
> send #3 if you did not receive a reply for #2. So here, the method of
> sending new IKE requests for DPD does not work. If packet #7 is a DPD
> packet, and it gets no reply, there can be no further communication with
> the peer. You cannot send another DPD as #8. So, after the
> retransmit-timeout is reached for packet #7, the whole thing is torn
> down.
>
> Basically, dpdtimeout is a no-op for IKEv2, as the exchange is
> dead when retransmit-timeout is reached.
>
> We have been wanting to remove dpdaction (and only do "hold") and let
> other mechanisms do delete or restart. And we have been wanting to
> remove dpdtimeout by only using the retransmit-timeout. Then dpddelay=
> is the only option left to determine how often one wants to test if
> the connection is still there if the IPsec and IKE is idle and not
> sending any packets.
>
> And to complete the story, seperate from all of this are NAT keepalives,
> which any IKE peer behind NAT will send every 20 seconds just to ensure
> the NAT gateway doesn't close the port mapping used for ESPinUDP and
> IKE. These are 1 byte packets that the receiver's kernel eats up and
> the userland never sees these at all.
>
> Paul
>
> > On Fri, Oct 15, 2021 at 8:07 PM Andrew Cagney <andrew.cagney at gmail.com>
> wrote:
> >       Are you running libreswan 4.5?  If not can you try that or
> mainline?
> >
> >       This is what 4.5 looks like when it revives a connection:
> >
> >       "westnet-eastnet-ipv4-psk-ikev2" #1: STATE_V2_ESTABLISHED_IKE_SA:
> >       retransmission; will wait 1 seconds for response
> >       "westnet-eastnet-ipv4-psk-ikev2" #1: STATE_V2_ESTABLISHED_IKE_SA:
> 60
> >       second timeout exceeded after 7 retransmits.  No response (or no
> >       acceptable response) to our IKEv2 message
> >       "westnet-eastnet-ipv4-psk-ikev2" #1: liveness action - restarting
> all
> >       connections that share this peer
> >       "westnet-eastnet-ipv4-psk-ikev2": terminating SAs using this
> connection
> >       "westnet-eastnet-ipv4-psk-ikev2" #2: ESP traffic information:
> in=84B out=84B
> >       "westnet-eastnet-ipv4-psk-ikev2" #3: initiating IKEv2 connection
> >       "westnet-eastnet-ipv4-psk-ikev2" #3: established IKE SA;
> authenticated
> >       using authby=secret and peer ID_FQDN '@west'
> >       "westnet-eastnet-ipv4-psk-ikev2" #4: established Child SA; IPsec
> >       tunnel [192.0.2.0-192.0.2.255:0-65535 0] ->
> >       [192.0.1.0-192.0.1.255:0-65535 0] {ESP=>0x5ef243d3 <0xdb669f85
> >       xfrm=AES_GCM_16_256-NONE NATOA=none NATD=none DPD=active}
> >
> >
> https://testing.libreswan.org/v4.5-0-gf36ab1b1df-main/ikev2-liveness-02/OUTPUT/east.pluto.log.gz
> >
> >       For IKEv2 the only settings that matter are (values are what the
> above
> >       test uses):
> >
> >       > dpdaction=restart
> >       > dpddelay=5
> >
> >       I'm pretty sure:
> >
> >       > dpdtimeout=30
> >
> >       is ignored - in IKEv2 it is a retransmit timeout that will trigger
> a restart.
> >
> >       On Fri, 15 Oct 2021 at 17:34, Dave Houser <davehouser1 at gmail.com>
> wrote:
> >       >
> >       > Hello,
> >       >
> >       > I am trying to implement dead peer detection. However when the
> far end SA kills the connection, the tunnel is never rebuilt.
> >       The tunnel will just stay down until a new rekey is initialized by
> the far end SA, in which case the connection will rebuild.
> >       BTW the far end is a Juniper SRX.
> >       >
> >       > Here is the output of /var/log/pluto.log right after I kill the
> connection on the far end, nothing else:
> >       >
> >       > Oct 15 23:33:10.518021: "to-vsrx-01" #6: ESP traffic
> information: in=756B out=1KB
> >       > Oct 15 23:33:10.584609: "to-vsrx-01" #3: established IKE SA
> >       >
> >       >
> >       > Here is my config:
> >       >
> >       > conn to-vsrx-01
> >       >     auto=start
> >       >     keyexchange=ike
> >       >     authby=secret
> >       >     ike=aes256-sha2_256;dh20
> >       >     esp=aes256-sha2_256
> >       >     left=2.2.1.2
> >       >     leftid=2.2.1.2
> >       >     leftsubnet=172.21.0.0/29
> >       >     leftupdown=/opt/_updown_vti01
> >       >     right=3.3.0.2
> >       >     rightsubnet=0.0.0.0/0
> >       >     dpddelay=1s
> >       >     dpdtimeout=1s
> >       >     dpdaction=restart
> >       >
> >       > Here is my leftupdown script I use
> >       >
> >       > #!/bin/bash
> >       >
> >       > set -o nounset
> >       > set -o errexit
> >       >
> >       > VTI_IF="vti01"
> >       >
> >       > case "${PLUTO_VERB}" in
> >       >     up-client)
> >       >         ip tunnel add $VTI_IF local 2.2.0.2 remote 3.3.0.2 mode
> vti key 42
> >       >         ip link set $VTI_IF up
> >       >         ip addr add  172.21.0.3 dev $VTI_IF
> >       >         ip route add 172.21.0.0/29 dev $VTI_IF
> >       >         ip route add 10.0.26.0/24 dev $VTI_IF
> >       >         sysctl -w "net.ipv4.conf.$VTI_IF.disable_policy=1"
> >       >         sysctl -w "net.ipv4.conf.$VTI_IF.rp_filter=0"
> >       >         sysctl -w "net.ipv4.conf.$VTI_IF.forwarding=1"
> >       >         ;;
> >       >     down-client)
> >       >         ip tunnel del $VTI_IF
> >       >         ;;
> >       > esac
> >       >
> >       > Am I misunderstanding what the dpd settings do? I need this
> tunnel to try to re-establish if it ever goes down. How can I
> >       accomplish this?
> >       >
> >       > - Dave
> >       >
> >       > _______________________________________________
> >       > Swan mailing list
> >       > Swan at lists.libreswan.org
> >       > https://lists.libreswan.org/mailman/listinfo/swan
> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.libreswan.org/pipermail/swan/attachments/20211020/442a64a9/attachment-0001.html>


More information about the Swan mailing list