[Swan-dev] TS_UNACCEPTABLE should not cause retransmit timeout and IKE+other children deletion

Mon Jul 5 19:03:40 UTC 2021

Was the test run with?

    ikev2: only return STF_{OK,FAIL,FATAL} from CREATE_CHILD_SA processor

    Not STF_FAIL+v2N.  Fixes a sec_label core dump when the initiator gets
    rejected.

    Internally use v2_notification_t, mapping that onto the above (the
    code will eventually need to send the notification to the responder as
    part of a separate informational exchange.  See 2.21. Error Handling).

the symptoms below look similar.

On Sun, 4 Jul 2021 at 22:13, Paul Wouters <paul at nohats.ca> wrote:

>
> I added two new testcases:
>
> ikev2-rw-multiple-subnets-4-mismatch        (mismatch in CREATE_CHILD_SA_
> ikev2-rw-multiple-subnets-5-mismatch-frst   (mismatch in IKE_AUTH)
>
> I ran the 4-mismatch test with mismatched subnets. That is the initiator
> has
> more subnets then responder. The first subnet matches, so IKE_AUTH
> completes. Then CREATE_CHILD_SA's are used for the further pending
> subnets.
>

With "pluto: When Child state fails, don't tear down IKE SA" reverted I see:

+003 "road/0x2" #3: dropping unexpected CREATE_CHILD_SA message containing
TS_UNACCEPTABLE notification; message payloads: SK; encrypted payloads: N;
missing payloads: SA,Ni,TSi,TSr
+036 "road/0x2" #3: encountered fatal error in state STATE_V2_NEW_CHILD_I1

which is a missing state transition.  Likely fallout from the above.

> I observe:
>
> "road/0x4" #1: initiating IKEv2 connection
> "road/0x3": queuing pending IPsec SA negotiating with 192.1.2.23 IKE SA #1
> "road/0x4"
> "road/0x2": queuing pending IPsec SA negotiating with 192.1.2.23 IKE SA #1
> "road/0x4"
> "road/0x1": queuing pending IPsec SA negotiating with 192.1.2.23 IKE SA #1
> "road/0x4"
> "road/0x4" #1: sent IKE_SA_INIT request
> "road/0x4" #1: switching CHILD #2 to pending connection "road/0x1"
> "road/0x4" #1: sent IKE_AUTH request {auth=IKEv2 cipher=AES_GCM_16_256
> integ=n/a prf=HMAC_SHA2_512 group=MODP2048}
> "road/0x4" #1: authenticated using RSA with SHA2_512 and peer certificate
> 'C=CA, ST=Ontario, L=Toronto, O=Libreswan, OU=Test Department, CN=
> east.testing.libreswan.org, E=user-east at testing.libreswan.org' issued by
> CA 'C=CA, ST=Ontario, L=Toronto, O=Libreswan, OU=Test Department,
> CN=Libreswan test CA for mainca, E=testing at libreswan.org'
> "road/0x4" #1: STATE_V2_ESTABLISHED_IKE_SA: retransmission; will wait 0.5
> seconds for response
> "road/0x4" #1: STATE_V2_ESTABLISHED_IKE_SA: retransmission; will wait 1
> seconds for response
> "road/0x4" #1: STATE_V2_ESTABLISHED_IKE_SA: retransmission; will wait 2
> seconds for response
> "road/0x4" #1: STATE_V2_ESTABLISHED_IKE_SA: retransmission; will wait 4
> seconds for response
> "road/0x4" #1: STATE_V2_ESTABLISHED_IKE_SA: retransmission; will wait 8
> seconds for response
> "road/0x4" #1: STATE_V2_ESTABLISHED_IKE_SA: retransmission; will wait 16
> seconds for response
> "road/0x4" #1: STATE_V2_ESTABLISHED_IKE_SA: retransmission; will wait 32
> seconds for response
> "road/0x4" #1: STATE_V2_ESTABLISHED_IKE_SA: 60 second timeout exceeded
> after 7 retransmits.  No response (or no acceptable response) to our IKEv2
> message
> "road/0x4" #1: liveness action - putting connection into hold
> "road/0x4" #1: deleting state (STATE_V2_ESTABLISHED_IKE_SA) aged 64.10909s
> and sending notification
> "road/0x4" #1: deleting IKE SA but connection is supposed to remain up;
> schedule EVENT_REVIVE_CONNS
>
>
And for this I see:

+003 "road/0x1" #2: IKE_AUTH response contained the error notification
TS_UNACCEPTABLE
+002 "road/0x4" #1: deleting other state #2 connection
(STATE_V2_IKE_AUTH_CHILD_I0) "road/0x1" and NOT sending notification

which is from this code:

v2_notification_t n = process_v2_IKE_AUTH_response_child_sa_payloads(ike,
md);
if (v2_notification_fatal(n)) {
/* already logged */
/*
* XXX: should be sending the fatal notification using
* a new exchange.
*/
return STF_FATAL;
} else if (n != v2N_NOTHING_WRONG) {
/* already logged */
/*
* XXX: should be sending the child failure
* notification using an additional exchange and then
* leaving the IKE SA up.
*
* Instead just wipe out the IKE family :-(
*/
return STF_V2_DELETE_EXCHANGE_INITIATOR_IKE_SA;
}

Like the comments point out, this is a hack.  The fix means queueing up a
new exchange to send a delete notify with the SPI received from the
responder?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.libreswan.org/pipermail/swan-dev/attachments/20210705/25f8f00d/attachment.html>