[Swan-dev] TS_UNACCEPTABLE should not cause retransmit timeout and IKE+other children deletion

Mon Jul 5 20:20:16 UTC 2021

On Mon, 5 Jul 2021 at 15:23, Paul Wouters <paul at nohats.ca> wrote:

> On Jul 5, 2021, at 15:03, Andrew Cagney <andrew.cagney at gmail.com> wrote:
>
>
> 
> Was the test run with?
>
>
> As you did without my patch…
>

But with:  ikev2: only return STF_{OK,FAIL,FATAL} from CREATE_CHILD_SA
processor

>
> With "pluto: When Child state fails, don't tear down IKE SA" reverted I
> see:
>
> +003 "road/0x2" #3: dropping unexpected CREATE_CHILD_SA message containing
> TS_UNACCEPTABLE notification; message payloads: SK; encrypted payloads: N;
> missing payloads: SA,Ni,TSi,TSr
> +036 "road/0x2" #3: encountered fatal error in state STATE_V2_NEW_CHILD_I1
>
> which is a missing state transition.  Likely fallout from the above.
>
>
> In your terms, this should be “fail”, not as it shows “fatal”.
>

That isn't the problem here.

Because of a bug, the code encountered a protected packet with no matching
state transition.  The response, which I think is reasonable, was to kill
the IKE family.

>
> And for this I see:
>
>
> +003 "road/0x1" #2: IKE_AUTH response contained the error notification
> TS_UNACCEPTABLE
> +002 "road/0x4" #1: deleting other state #2 connection
> (STATE_V2_IKE_AUTH_CHILD_I0) "road/0x1" and NOT sending notification
>
> which is from this code:
>
> v2_notification_t n = process_v2_IKE_AUTH_response_child_sa_payloads(ike,
> md);
> if (v2_notification_fatal(n)) {
> /* already logged */
> /*
> * XXX: should be sending the fatal notification using
> * a new exchange.
> */
> return STF_FATAL;
> } else if (n != v2N_NOTHING_WRONG) {
> /* already logged */
> /*
> * XXX: should be sending the child failure
> * notification using an additional exchange and then
> * leaving the IKE SA up.
> *
> * Instead just wipe out the IKE family :-(
> */
> return STF_V2_DELETE_EXCHANGE_INITIATOR_IKE_SA;
> }
>
> Like the comments point out, this is a hack.
>
>
> It’s really bad. I’d rather that this subnet never tried again and keeps
> the house intact.
>

No argument.  It's there to keep existing behaviour.

It is common for ends to have mismatched subnets. Right now this means
> global outage. Can we just return STF_FAIL?
>
>
Sure, try it.  It's the first step.

I'm just not sure what delete_state(child) will do.  Will it put the
connection back onto the pending queue?

Alternatively modify the code in
process_v2_IKE_AUTH_response_child_sa_payloads() following the comment:

* Was there a child error notification?  The RFC says this
* list isn't definitive.
*
* XXX: can this code assume that the response contains only
* one notify and that is for the child?  Given notifies are
* used to communicate compression I've my doubt.

to zap the state and return v2N_NOTHING_WRONG.

  The fix means queueing up a new exchange to send a delete notify with the
> SPI received from the responder?
>
>
> There is no need to send a notify when you received TS_UNACCEPTABLE.
>
> Yes queueing a pending task with delay would be needed but i am happy to
> do that after the 4.5 release.
>
> Paul
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.libreswan.org/pipermail/swan-dev/attachments/20210705/a644e72d/attachment.html>