[Swan] DPD settings in config does not trigger updown script on disconnect

Mike Brown mikebrown at pws.bz
Tue Jul 13 23:25:35 UTC 2021

Hello, first time writing the list.  Let me know if this is going to the wrong place.

My overall goal is for a peer running a pair HA tunnels to terminate at my libreswan node (so my node has 2 tunnels using the same right/left-subnets in their .conf, but different marks).  My local "switching" to implement the HA behavior is an updown script - on up/route it writes iptables connmark rules to send packets to the mark of the tunnel referenced in the updown invocation.  (I have no preferred "primary".)  On down/unroute it removes rules for the tunnel referenced in the invocation, leaving rules to the remaining tunnel mark only.

I have a test setup in a cloud space with end-users A and B who talk between their separate VPCs through the tunnels.  I have a pair of endpoints to mock up my peers (call them P1 and P2) in A's VPC.  And I have a node (N) in B's VPC where a tunnel each from P1 and P2 terminate.  (For cost, P1 and P2 in this test setup are also Libreswans.)  All router boxes are Libreswan 3.23, Ubuntu 18.04.5 LTS, running in the AWS cloud free-tier.

The setup passes traffic and the HA switching behavior works as intended if I issue "ipsec auto --delete p1_to_n" while on P1.
P1 sends a terminator message to N.  N calls my updown script while downing and unrouting the tunnel.  The script removes the iptables rules directing traffic into P1's xfrm tunnel but rules for P2 are still in place so traffic immediately flows over through P2.  (The routing switch on the other side for A -> P2 is handled and not relevant here.)  I can re-up and down P1 and P2 at will and see the traffic is not interrupted between A and B (as long as there is at least one tunnel up).

But, where I am having trouble is when I try to make this more realistic by suddenly blocking traffic instead of issuing a --delete.  My expectation for this scenario was that DPD would detect the disconnect, down the tunnel (as suggested in libreswan DPD code tests) and call my updown script; but that has not been the case.  I see NAT-T packets go out, but not DPD and lastdpd=-1 never changes.  If I disable NAT-T (which may cause me other problems with AWS public addressing) I do see an R_U_THERE and _ACK, but only once.  After the first NAT-T disabled DPD exchange, I see "DPD: no need to send or schedule DPD for replaced IPsec SA" repeatedly (every 30 seconds, matching my dpdtimeout) but I never see another DPD exchange.  I used iptables DROP on in/out to model a disconnect but also tried AWS ACLs in case there was some difference. (Netlink seems to recognize the inability to send when the drop rules are in place.)

I've done quite a bit of diving to see what's happening and am happy to drop both my digest and/or raw-logs here, but as a new user I first wanted to check if I'm just missing something entirely before I completely word vomit on the mail list.

Thank you in advance,

