[Swan-dev] ikev2-32-nat-rw-rekey is weird

Wed Dec 5 21:01:29 UTC 2018

I think I've restored ikev2-32-nat-rw-rekey's behaviour.

commit 48ab456071939535f9f915622162bbcc056fe2ea (origin/master,
origin/HEAD, master)
Author: Andrew Cagney <cagney at gnu.org>
Date:   Mon Dec 3 11:01:34 2018 -0500

    ikev2: schedule "replace" as explicit "rekey" (new event),
"replace", or "expire" events

    The schedule replace code, depending on context will schedule an
    explicit "rekey", "replace", or "expire".

    The "rekey" handler starts a rekey of the SA (the IKE SA calls
    ikev2_rekey_ike_start(), the CHILD uses magic and a call to
    ipsecdoi_replace()).  A replace is then scheduled.

    The "replace" handler seeing a rekey in progress "cleans up" the mess:
    for the IKE SA it forces a full replace and forced "expire"; for the
    old CHILD SA, it is forced to "expire" (what happens to the new CHILD
    SA remains a mystery; can CHILD SA even skip directly from "rekey" to
    "expire"?).

    This should restore a quirk in ikev2-32-nat-rw-rekey where the rekey
    runs runs out of time.

    (Note that there is a deliberate bug where EVENT_SA_REKEY is logged as
    the old EVENT_SA_REPLACE.  It avoids churning the output.  Something
    to fix later).

On Mon, 26 Nov 2018 at 12:08, Andrew Cagney <andrew.cagney at gmail.com> wrote:
>
> On Mon, 26 Nov 2018 at 11:16, Antony Antony <antony at phenome.org> wrote:
> >
> > an unestablished child state would become a new "connection" initiation (STATE_PARENT_I1) when the parent deletes. That is how #4 is created
>
> Unfortunately what was happening depended on luck:
>
> - the #1 REPLACE event would create a re-key state #3 and hash that to
> a random slot
>
> - the #1 EXPIRE event would then call delete_my_family(IKE SA, FALSE) which:
> -- deleted all children of the IKE SA, but only if they are hashed to
> the same slot as the expired IKE SA
> -- since #3 re-key state was hashed to a random slot (which may or may
> not match the IKE SA's slot) it surviving this depended on luck
>
> Assuming #3 survivied, the code would then call delete_state() which:
>
> > delete_state
> >  flush_pending_children
> >   flush_pending_child
> >         #queue up new IKE_INIT exchange.
>
> because it was searching the entire state table, and not just the IKE
> SA's hash slot, would stumble across the rekey state #3 and cause it
> to trigger a replace
>
> While the quick fix seems to be to not delete the re-key state #3 it
> seems weird.
>
> - other than the re-key state, could there ever be another other state
> lurking in the state table?
>
> - since the old IKE SA needs replacing, then why not just replace it
>
> > And #4 deletes when retransmit expires, say 60sec default.
> > I think keyingtries is to supposed to keep it going, create #5 and so on.
> >
> > -antony
> >
> >
> > On Mon, Nov 26, 2018 at 10:26:25AM -0500, Andrew Cagney wrote:
> > > The old code was doing roughly:
> > >
> > >   #1 established as IKE SA
> > >   #2 established as CHILD SA
> > >
> > > and then
> > >
> > >  | handling event EVENT_SA_REPLACE for parent state #1
> > >  | #3 schedule initiate IKE Rekey SA none to replace IKE# 1
> > >   - can't as network is down but keeps retrying
> > >  | inserting event EVENT_SA_EXPIRE, timeout in 13.000 seconds for #1
> > >  - i.e., switch #1 from REPLACE to EXPIRE
> > >
> > > and then
> > >
> > >   | #1: ISAKMP SA expired (LATEST!)
> > >   - deletes all known children (i.e. #2, but not #3 - that's become a zombie)
> > >   | #1: reschedule pending child #3 STATE_V2_REKEY_IKE_I of connection
> > > "road-east-x509-ipv4"[1] 192.1.2.23 - the parent is going away
> > >   | inserting event EVENT_SA_REPLACE, timeout in 0.000 seconds for #3
> > >   - i.e, flips #3's event from retransmit to replace
> > >   - deletes itself (#3)
> > >
> > > and this wakes up zombie #3 causing it to:
> > >
> > >   #3: handling event EVENT_SA_REPLACE for child state
> > >   - creates #4 to do full re-negotiation
> >
> >
> >
> > >   - deletes itself
> > >
> > > Since the new code deletes #3 (re-key state) while deleting #1
> > > (original IKE SA) there is no #3 zombie state to bring back from the
> > > dead.  Hence the connection dies.
> > >
> > > My guess is what should happen is: the #1 EXPIRE event (clearly it
> > > wasn't as wakes up the zombie state #3 causing it to replace REPLACE)
> > > should do the replace itself.  Any thoughts.
> > > _______________________________________________
> > > Swan-dev mailing list
> > > Swan-dev at lists.libreswan.org
> > > https://lists.libreswan.org/mailman/listinfo/swan-dev