<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, 5 Jul 2021 at 13:22, Paul Wouters <<a href="mailto:paul@nohats.ca" target="_blank">paul@nohats.ca</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Mon, 5 Jul 2021, Andrew Cagney wrote:<br>

<br>

>       But that is changing the meaning of FAIL vs FATAL.<br>

> <br>

> <br>

> (sarcasm) It's hard to change the meaning of something that was, well, meaningless.<br>

> <br>

> This is what v3.23 had:<br>

> <br>

> STF_INTERNAL_ERROR:<br>

> It just signals to the state machine to do nothing (as I discovered last week, it also signals that the state on the stack state may have been deleted)!<br>

> -> I think it should go; internal errors are either recoverable (just kill IKE family), or passert<br>

<br>

This is supposed to core dump. We hit something that never is supposed<br>

to happen.<br></blockquote><div><br></div><div>The most common case is realising that the send buffer is full.  Probably not worth a core dump; but yes I'm sure there's other code where it should abort.<br></div><div><br></div><div>I suspect the code deleting the state then returning STF_INTERNAL_ERROR is something of an outlier (or IKEv1).<br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

> STF_DROP:<br>

> delete current state; but "be vewy vewy quiet"<br>

> -> this is gone<br>

> <br>

> STF_FATAL:<br>

> delete current state<br>

> (for a child it would delete sending a notify, then see the ike sa was childless and delete that sending a second notify; except when there were multiple<br>

> children)<br>

> -> it now completes message exchange and deletes the IKE family; it is after all fatal<br>

<br>

That is wrong. Because a fatal child should not delete the IKE SA or<br>

other exitsing Child SA states, just because of the children is<br>

STF_FATAL. Eg this is now executed when two endpoints misconfigure<br>

their subnets. It makes it impossible for an admin to add a new subnet<br>

and then tell the other end to add it, since it now tears down the<br>

house.<br>

<br></blockquote><div><br></div><div>Should not, vs cannot.</div><div>While I agree it shouldn't happen, it can.<br></div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

> STF_FAIL:<br>

> do pretty much nothing<br>

> -> which is useless, the exchange failed so things are going nowhere<br>

> -> it now completes message exchange and deletes current state<br>

<br>

In IKEv1, I think this can still be valid. In IKEv2, things are more<br>

clear so it is either OK or FATAL.<br>

<br></blockquote><div><br></div><div>I should have been clearer.  This is what the v2.32 *IKEv2* code did.</div><div><br></div><div>IKEv1 was already separated (but admitidly, the ikev2 code still looked like a copy/paste of IKEv1).<br></div><div><br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

> STF_FAIL+v2N_....:<br>

> in the responder send v2N and then deleted a state (during IKE_SA_INIT that was IKE, else it was the child, but see alsofor the initiator this did nothing<br>

> -> this code path is gone<br>

<br>

even in IKEv1?<br></blockquote><div><br></div><div>I don't know what IKEv1 does.</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

> STF_IGNORE:<br>

> it's been pretty consistent; pretend the message never happened<br>

> <br>

> So STF_DROP, STF_FATAL and STF_FAIL+v2N could all potentially delete a state; and STF_FAIL on its own did nothing.<br>

<br>

well, "nothing" meant it could retransmit the request, eg a Quick Mode<br>

Request that got denied before. Eg retrying. It works because of the<br>

unique msgids, you could just keep retransmitting such packets.<br></blockquote><div><br></div><div>Not IKEv2.</div></div><div class="gmail_quote"><br></div><div class="gmail_quote">Since the response is protected, there's no mistaking the answer: one thumb up, one thumb down, two thumbs down.  Retransmitting the request won't change this(1).  There is no middle ground here.<br></div><div class="gmail_quote"><br></div><div class="gmail_quote">But that didn't stop broken IKEv2 code from trying, and hiding it behind a release whack.  To spell it out:</div><div class="gmail_quote">- it would ignore the response; but release whack<br></div><div class="gmail_quote">- retransmit</div><div class="gmail_quote">- timeout and only then take the action it should have taken at the very start; childless SAs caused this to be fixed<br></div><div class="gmail_quote"><br></div><div class="gmail_quote">The only way to retry the SA by initiating a new exchange.<br></div><div class="gmail_quote"><div><br></div><div>(1) Ok, technically a really crappy memory starved responder might, when given a retransmit, try to re-perform the same operation (oh, wait, we've got IKEv1 code doing that).<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

>       FAIL used to mean "there is a non-fatal error, please keep trying this state"<br>

>       FATAL meant "there is no more hope, kill the state"<br>

<br>

>       Note that this applies to the _states_ and not the connection. That is,<br>

>       starting a new keying attempt after a timeout and/or revival are not<br>

>       related to these state changes.<br>

> <br>

> <br>

> If a state transition fails then it fails.  There is no "please keep trying this state".<br>

<br>

That depends on the retransmit logic/arhitecture. Also, in IKEv1 you<br>

could have Child SA states without having an IKE state. It was a<br>

different world :)<br>

<br>

> The only remaining question is how much damage should the failure inflict?  Delete the state or delete the IKE family (and yes any exchange can trigger a<br>

> delete family, a critical payload, bad payload lengths, ...).<br>

<br>

I still don't agree with this statement. A new CREATE_CHILD_SA that<br>

fails should not take down existing Child SA's or an IKE SA. Even an<br>

IKE REKEY failure should not take down the existing IKE SA.<br></blockquote><div><br></div><div>Again.  Should not vs cannot.<br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

> However, what I have found and fixed is code trying to ignore failure (failure is not not an option :-).  For instance, code would ignore unexpected<br>

> notifications in an IKE_AUTH response, and instead let things retransmit.  This is futile.  The response was protected, so no number of retransmits will<br>

> change its content.<br>

<br>

Agreed, but IKEv1 might not be that clear.<br></blockquote><div><br></div><div>This is IKEv2.</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

>       In that sense, a CREATE_CHILD_SA that received an answer that leads to<br>

>       an unfixable isse, like TS_UNACCEPTABLE, is a FATAL state change for<br>

>       the child sa. FAIL would mean we might try to retransmit, which is<br>

>       what is wrongly happening here. FAIL would also happen if we got an<br>

>       unauthenticated reply we didn't like, so we would still wait for a<br>

>       possible unauthenticated reply we like (to prevent attackers from<br>

>       locking out legitimate packets).<br>

> <br>

> If a packet fails protection, it's STF_IGNOREd.  The exception is with IKE_SA_INIT bit it too is free to STF_IGNORE a message.<br>

>  <br>

><br>

>       The way IKEv2 works with serial MSGID, we really don't have STF_FAIL in<br>

>       that sense. We only have STF_OK or STF_FATAL. (well also STF_SUSPEND etc)<br>

>       The only STF_FAIL would be the if we get no reply, and we need to<br>

>       retransmit. I can't think of an STF_FAIL that would operate on the child<br>

>       SA state.<br>

> <br>

> <br>

> We're somewhat on the same page.  Any message can carry a notification that implies that the IKE family is dead.  For instance, INVALID_SYNTAX.  So the<br>

> exchange outcome is either Ok, or fatal.<br>

<br>

See above. I don't agree that INVALID_SYNTAX should be interpreted to<br>

tear down the entire house. Or at least we should be really careful<br>

with sending INVALID_SYNTAX. I find these two bits of the RFC rather<br>

conflicting:<br></blockquote><div><br></div><div>The RFCs pretty clear on what to do with INVALID_SYNTAX, and for the code path we care about - crap payloads - when to send it.</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

<br>

    If a peer parsing a request notices that it is badly formatted (after<br>

    it has passed the message authentication code checks and window<br>

    checks) and it returns an INVALID_SYNTAX notification, then this<br>

    error notification is considered fatal in both peers, meaning that<br>

    the IKE SA is deleted without needing an explicit Delete payload.<br>

<br>

vs<br>

<br>

  INVALID_SYNTAX                            7<br>

        Indicates the IKE message that was received was invalid because<br>

        some type, length, or value was out of range or because the<br>

        request was rejected for policy reasons.<br>

<br>

Specifically the "for policy reasons".<br>

<br>

But even so, INVALID_SYNTAX is rare and not the problem we are dealing<br>

with now, which is things like TS_UNACCEPTABLE. That is the case now<br>

where we tear down the house and we should not.<br></blockquote><div><br></div><div>... presumably there's a code path that should have returned STF_FAIL.<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

> However with a CREATE_CHILD_SA exchange, there's three outcomes:<br>

> - exchange succeeds; child succeeds<br>

> - exchange succeeds; child fails (ike survives)<br>

> - exchange fails, implying that the ike family dies<br>

> hence the need for three status<br>

> <br>

> Taking a step back, and ignoring our current code, which would you consider more serious: FAIL or FATAL?<br>

> Personally, I'd treat something fatal as more serious and assume it's wiping things out.<br>

<br>

As you said, I think we are on the same page, but FATAL in a child<br>

should not cause the IKE SA or other Child SAs to die (maybe the<br>

exception to this is indeed INVALID_SYNTAX - I don't care too much<br>

since we don't really see that case happening in real life, and I'm<br>

mostly concerned now with fixing the mismatching subnet cases that<br>

throw TS_UNACCEPTABLE)<br></blockquote><div><br></div><div>Again, three possible outcomes requiring three clear status codes:<br></div><div>> - exchange succeeds; child succeeds STF_OK<br>

> - exchange succeeds; child fails (ike survives) STF_FAIL<br>

> - exchange fails, implying that the ike family dies STF_FATAL<br></div><div>we need a way to differentiate between them, and return accordingly.</div></div><div class="gmail_quote"><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

> The IKE_AUTH exchange is different.   The message is passed to the IKE SA, and then as a nested operation, it tries to create the child.  If the attempt<br>

> to create the child proves fatal, the IKE SA can proceed accordingly.<br>

<br>

That is not the case yet, which is why I added _two_ test cases with<br>

mismatched subnet. One of those mismatches the first subnet, thus<br>

causing a failure in IKE_AUTH, which currently also/still tears<br>

down the IKE SA (indirectly through STF_FATAL, even if your recent<br>

childless code made the IKE SA survive the initial failure)<br>

<br>

> If CREATE_CHILD_SA followed this model (I call it nested states), then we could only need two codes.<br>

<br>

Works for me, but we have to be careful not to break IKEv1 assumptions<br>

and running code.<br></blockquote><br><div>None of this code is in the IKEv1 code path.  Hasn't been for years.  However, it isn't trivial.<br></div><div> <br></div></div></div>