[Swan-dev] re-ordered events causing whack hang

Andrew Cagney andrew.cagney at gmail.com
Tue Nov 19 15:05:48 UTC 2019


Here's the relevant code:

        /*
         * XXX: Danger!
         *
         * Because the code above has blatted MD->ST with the child
         * state (CST) and this function's caller is going to try to
         * complete the V2 state transition on MD->ST (i.e., CST) and
         * using the state-transition MD->SVM the IKE SA (PST) will
         * never get to complete its state transition.
         *
         * Get around this by forcing the state transition here.
         *
         * But what should happen?  A guess is to just leave MD->ST
         * alone.  The CHILD SA doesn't really exist until after the
         * IKE SA has processed and approved of the response to this
         * IKE_AUTH request.
         */

        pexpect(md->svm->timeout_event == EVENT_RETRANSMIT); /* for CST */
        delete_event(pst);
        event_schedule(EVENT_SA_REPLACE,
deltatime(PLUTO_HALFOPEN_SA_LIFE), pst);
        change_state(pst, STATE_PARENT_I2);

so we already knew that this was dangerous.  I'm tweaking the code to
use max(r_timeout*2,PLUTO_HALFOPEN_SA_LIFE) so that the retransmit
code has time to do its stuff and this only kicks in when somethings
gone south (but I've this feeling that it will break some tests that
rely on careful timing, mia culpa).

On Mon, 18 Nov 2019 at 18:08, Andrew Cagney <andrew.cagney at gmail.com> wrote:
>
> Both show the following:
>
> prepare IKE_SA_INIT request
> | inserting event EVENT_CRYPTO_TIMEOUT, timeout in 60 seconds for #1
>
> send it
> | inserting event EVENT_RETRANSMIT, timeout in 60 seconds for #1
>
> prepare IKE_AUTH request
> | inserting event EVENT_CRYPTO_TIMEOUT, timeout in 60 seconds for #1
>
> start sending it
> | state #1 requesting EVENT_CRYPTO_TIMEOUT to be deleted
> | inserting event EVENT_SA_REPLACE, timeout in 60 seconds for #1
> emit message and send then ...
> | inserting event EVENT_RETRANSMIT, timeout in 60 seconds for #2
>
> so two events - REPLACE and RETRANSMIT - scheduled for 60 seconds
>
> For the working case:
> https://testing.libreswan.org/v3.28-178-g8c50f71b3-master/ikev2-ppk-static-03-no-insist-fail/OUTPUT/west.pluto.log.gz
> the RETRANSMIT is delivered first (why is a bit of a mystery ...)
>
> | handling event EVENT_RETRANSMIT for 192.1.2.23
> "westnet-eastnet-ipv4-psk-ppk" #2 attempt 2 of 0
> | and parent for 192.1.2.23 "westnet-eastnet-ipv4-psk-ppk" #1 keying
> attempt 1 of 0; retransmit 1
> | retransmits: current time 79.183207; retransmit count 0 exceeds
> limit? NO; deltatime 60 exceeds limit? YES; monotime 59.96772 exceeds
> limit? NO
> "westnet-eastnet-ipv4-psk-ppk" #2: STATE_PARENT_I2: 60 second timeout
> exceeded after 0 retransmits.  Possible authentication failure: no
> acceptable response to our first encrypted message
> "westnet-eastnet-ipv4-psk-ppk" #2: starting keying attempt 2 of an
> unlimited number, but releasing whack
>
> but then things break:
> https://testing.libreswan.org/v3.28-181-g0b61816bd-master/ikev2-ppk-static-03-no-insist-fail/OUTPUT/west.pluto.log.gz
> because the REPLACE event is delivered first ....
>
> | handling event EVENT_SA_REPLACE for parent state #1
> ..  steal the existing IKE's whack and pending structures ...
> | picked newest_isakmp_sa #0 for #1
> | replacing stale IKE SA
> | dup_any(fd at 25) -> fd at 15 (in ipsecdoi_replace() at ipsec_doi.c:307)
> ...
> "westnet-eastnet-ipv4-psk-ppk" #3: initiating v2 parent SA to replace #1
> ... which means that #2 has no hope of releasing whack ...
> "westnet-eastnet-ipv4-psk-ppk" #2: starting keying attempt 2 of an
> unlimited number, but releasing whack
>
> So:
>
> - given the REPLACE and RETRANSMIT reversed their order, perhaps
> libevent changed - 2.1.8-stable (2010800) - apparently not
> - or this patch
> https://github.com/libreswan/libreswan/commit/a0e9f7cd01ecd9073b974ee12a5e0a5f29003dc0
> somehow caused the events to be delivered in the correct order
>
> I'm not sure which is more scary ...
>
> but regardless:
>
> - why schedule a "REPLACE" event when we've got a re-transmit that
> will "REPLACE" anyway?
> - and if there is a REPLACE should it release whack?
>
> andrew


More information about the Swan-dev mailing list