[Swan-dev] IKEv2 revival

Fri May 1 17:22:09 UTC 2020

On Fri, 1 May 2020 12:39:54 -0400 (EDT)
Paul Wouters <paul at nohats.ca> wrote:

> On Fri, 1 May 2020, Andrew Cagney wrote:
> 
> > To make the problem more concrete, consider  two connections c-1
> > and c-2 want to share the same IKE SA.
> > 
> > First, I believe liveness=hold and policy revive are in conflict.
> > For instance, say "c-1" triggers a liveness event which times out.
> > Should "c-1" follow liveness=hold or policy revival?  What about
> > when "c-1"'s rekey fails? ...  
> 
> the dpd/liveness action should be phased out. The hold action was to
> keep a hold into place to prevent leaks, while another mechanism
> restarts the connection. hold was never valid for connections with
> rekey=no that are supposed to clean up all state when going down.

Not so simple. auto=ondemand, rekey=no - that is on-demand tunneling.
With this combination we absolutely want hold/trap.

auto=add, rekey=no is when we want to clear.

> The fact that we can specify different dpdaction= for connections with
> the same IKE SA is a limitation in our connection loading. I consider
> it a misconfiguration that we do not need to support.

I agree- and that is one reason why I think we should phase out
dpdaction completely and add real logics which corresponds other config
options. We really do know when we want tunnel to initiate again and
when not.

> A liveness event that times out is a failure of the IKE SA, which
> means it should affect ALL connections that share the IKE SA. They
> should all go into failure mode. If revival is required, the
> connections should go down into auto=ondemand to get a hold without a
> leak. Once the connection is up the hold will be replaced with the
> IPsec SA policy.

Actually I think this works for most cases but not all. If only our end
can initiate connection we should go to revival and try to get tunnel
up. That is if config has auto=start we should initiate immediately.

> > Here are some scenarios and what happens today (vs yesterday).  In
> > each the question is what _should_ happen.
> > 
> > - c-1 initiates; c-2 put on pending queue  
> > -> retransmits call ipsecdoi_replace("c-1", try)  
> > -? but what happens to c-2 remains on the pending queue?  
> 
> What is the problem here?
> 
> > - c-1 established; c-2 initiates (I'm assuming this uses
> > retransmits) -> retransmits call liveness_action("c-2") because the
> > IKE SA is established and "c-2" initiated the exchange -> see
> > above, what should happen to "c-1" and does it?  
> 
> liveness using c-1 for c-2 that is not yet established should not
> happen. c-2 should never do liveness before its IPsec SA is
> established. If create_child_sa for c-2 times out, that might be a
> reason to send a liveness probe, but really not needed as the
> create_child_sa is effectively a liveness probe too. And since we
> have a window of 1, we cannot send one anyway. If we reach a timeout
> for create_child_sa answer, the IKE SA should be considered dead, and
> thus c-1 will be torn down as well when we delete the IKE SA.

I agree. There must never be liveness before Child SA is established.
Only established child sa can cause liveness checks to start.

> > - c-1 established; c2 established; c-1 or c-2 triggers liveness  
> > -> retransmits call liveness_action("c-1") because retransmits use
> > the IKE SA and the IKE SA is established and liveness action is
> > called with st->st_connection  
> 
> The connection "used" does not really matter. If the IKE SA
> sends/receives a liveness, then all connections (childs) are
> considered live.
> 
> > - c-1 established; c2 established; c-1 or c-2 rekey
> > either of  
> > -> retransmits call liveness_action("new c-[12]") because the new
> > child initiated the exchange and the IKE SA is established  
> > -> replace calls v2_event_sa_replace(st) - you need to read its
> > comments - I suspect it should queue up a delete exchange and then
> > let retransmits kill the IKE SA  
> 
> We should never try to send out a liveness probe if we are alread
> waiting on an IKE reply. If a liveness probe is needed, then whatever
> request is in transit will act as that liveness proof. If the current
> request will timeout, that is also the equivalent to a failed liveness
> probe, and the IKE SA gets torn down, taking down its children.

Exactly - unlike with ikev1, when ike sa doesn't work we must take all
our Child SAs down anyway.

-- 
Tuomo Soini <tis at foobar.fi>
Foobar Linux services
+358 40 5240030
Foobar Oy <https://foobar.fi/>