[Swan-dev] IKEv2 revival

Fri May 1 16:39:54 UTC 2020

On Fri, 1 May 2020, Andrew Cagney wrote:

> To make the problem more concrete, consider  two connections c-1 and c-2 want to share the same IKE SA.
> 
> First, I believe liveness=hold and policy revive are in conflict.  For instance, say "c-1" triggers a liveness event which
> times out.  Should "c-1" follow liveness=hold or policy revival?  What about when "c-1"'s rekey fails? ...

the dpd/liveness action should be phased out. The hold action was to
keep a hold into place to prevent leaks, while another mechanism
restarts the connection. hold was never valid for connections with
rekey=no that are supposed to clean up all state when going down.

The fact that we can specify different dpdaction= for connections with
the same IKE SA is a limitation in our connection loading. I consider
it a misconfiguration that we do not need to support.

A liveness event that times out is a failure of the IKE SA, which means
it should affect ALL connections that share the IKE SA. They should all
go into failure mode. If revival is required, the connections should go
down into auto=ondemand to get a hold without a leak. Once the
connection is up the hold will be replaced with the IPsec SA policy.

> Here are some scenarios and what happens today (vs yesterday).  In each the question is what _should_ happen.
> 
> - c-1 initiates; c-2 put on pending queue
> -> retransmits call ipsecdoi_replace("c-1", try)
> -? but what happens to c-2 remains on the pending queue?

What is the problem here?

> - c-1 established; c-2 initiates (I'm assuming this uses retransmits)
> -> retransmits call liveness_action("c-2") because the IKE SA is established and "c-2" initiated the exchange
> -> see above, what should happen to "c-1" and does it?

liveness using c-1 for c-2 that is not yet established should not
happen. c-2 should never do liveness before its IPsec SA is established.
If create_child_sa for c-2 times out, that might be a reason to send a
liveness probe, but really not needed as the create_child_sa is
effectively a liveness probe too. And since we have a window of 1, we
cannot send one anyway. If we reach a timeout for create_child_sa
answer, the IKE SA should be considered dead, and thus c-1 will be
torn down as well when we delete the IKE SA.

> - c-1 established; c2 established; c-1 or c-2 triggers liveness
> -> retransmits call liveness_action("c-1") because retransmits use the IKE SA and the IKE SA is established and liveness
> action is called with st->st_connection

The connection "used" does not really matter. If the IKE SA
sends/receives a liveness, then all connections (childs) are
considered live.

> - c-1 established; c2 established; c-1 or c-2 rekey
> either of
> -> retransmits call liveness_action("new c-[12]") because the new child initiated the exchange and the IKE SA is
> established
> -> replace calls v2_event_sa_replace(st) - you need to read its comments - I suspect it should queue up a delete exchange
> and then let retransmits kill the IKE SA

We should never try to send out a liveness probe if we are alread
waiting on an IKE reply. If a liveness probe is needed, then whatever
request is in transit will act as that liveness proof. If the current
request will timeout, that is also the equivalent to a failed liveness
probe, and the IKE SA gets torn down, taking down its children.

Paul

> tests would be nice
> 
> 
> On Tue, 28 Apr 2020 at 12:05, Andrew Cagney <andrew.cagney at gmail.com> wrote:
>       Adding to the list of functions that revive ...
> 
> On Mon, 27 Apr 2020 at 12:06, Andrew Cagney <andrew.cagney at gmail.com> wrote:
>       I just pushed code to implement liveness probes using the retransmit timer.  When retransmits time-out:
> 
> - if the IKE SA hasn't established, it does a 'retry' using ipsecdoi_replace(st, try)
> 
> - else, presumably the IKE SA is established, and it calls liveness_action(); I suspect this doesn't handle
> multiple children, and know it won't handle an IKE exchange timing out
> 
> (there's also add_revival(), but I'm not sure if that applies here?  And there's pending ...)
> 
> So my question is what should happen?
> 
> - are the established and not established paths really that different (for instance an established IKE SA may
> have an incomplete CHILD SA)
> 
> - do established CHILD SAs linger so that the IPsec connection is 'up' (even though evidence suggests it is
> dead)
> 
> - and I have to wonder what the difference between replace and pending is
> 
> 
> - a rekey (the obvious next candidate for doing proper retransmits) calls  v2_event_sa_replace()
> 
> 
>