[Swan-dev] re-ordered events causing whack hang

Andrew Cagney andrew.cagney at gmail.com
Mon Nov 18 23:08:37 UTC 2019


Both show the following:

prepare IKE_SA_INIT request
| inserting event EVENT_CRYPTO_TIMEOUT, timeout in 60 seconds for #1

send it
| inserting event EVENT_RETRANSMIT, timeout in 60 seconds for #1

prepare IKE_AUTH request
| inserting event EVENT_CRYPTO_TIMEOUT, timeout in 60 seconds for #1

start sending it
| state #1 requesting EVENT_CRYPTO_TIMEOUT to be deleted
| inserting event EVENT_SA_REPLACE, timeout in 60 seconds for #1
emit message and send then ...
| inserting event EVENT_RETRANSMIT, timeout in 60 seconds for #2

so two events - REPLACE and RETRANSMIT - scheduled for 60 seconds

For the working case:
https://testing.libreswan.org/v3.28-178-g8c50f71b3-master/ikev2-ppk-static-03-no-insist-fail/OUTPUT/west.pluto.log.gz
the RETRANSMIT is delivered first (why is a bit of a mystery ...)

| handling event EVENT_RETRANSMIT for 192.1.2.23
"westnet-eastnet-ipv4-psk-ppk" #2 attempt 2 of 0
| and parent for 192.1.2.23 "westnet-eastnet-ipv4-psk-ppk" #1 keying
attempt 1 of 0; retransmit 1
| retransmits: current time 79.183207; retransmit count 0 exceeds
limit? NO; deltatime 60 exceeds limit? YES; monotime 59.96772 exceeds
limit? NO
"westnet-eastnet-ipv4-psk-ppk" #2: STATE_PARENT_I2: 60 second timeout
exceeded after 0 retransmits.  Possible authentication failure: no
acceptable response to our first encrypted message
"westnet-eastnet-ipv4-psk-ppk" #2: starting keying attempt 2 of an
unlimited number, but releasing whack

but then things break:
https://testing.libreswan.org/v3.28-181-g0b61816bd-master/ikev2-ppk-static-03-no-insist-fail/OUTPUT/west.pluto.log.gz
because the REPLACE event is delivered first ....

| handling event EVENT_SA_REPLACE for parent state #1
..  steal the existing IKE's whack and pending structures ...
| picked newest_isakmp_sa #0 for #1
| replacing stale IKE SA
| dup_any(fd at 25) -> fd at 15 (in ipsecdoi_replace() at ipsec_doi.c:307)
...
"westnet-eastnet-ipv4-psk-ppk" #3: initiating v2 parent SA to replace #1
... which means that #2 has no hope of releasing whack ...
"westnet-eastnet-ipv4-psk-ppk" #2: starting keying attempt 2 of an
unlimited number, but releasing whack

So:

- given the REPLACE and RETRANSMIT reversed their order, perhaps
libevent changed - 2.1.8-stable (2010800) - apparently not
- or this patch
https://github.com/libreswan/libreswan/commit/a0e9f7cd01ecd9073b974ee12a5e0a5f29003dc0
somehow caused the events to be delivered in the correct order

I'm not sure which is more scary ...

but regardless:

- why schedule a "REPLACE" event when we've got a re-transmit that
will "REPLACE" anyway?
- and if there is a REPLACE should it release whack?

andrew


More information about the Swan-dev mailing list