[Swan-dev] test cases to look into before release

Sat Jan 26 15:58:20 UTC 2019

(I saw your followup, but this e-mail has more context)

On Fri, 25 Jan 2019 at 21:33, Paul Wouters <paul at nohats.ca> wrote:
>
> On Fri, 25 Jan 2019, Andrew Cagney wrote:
>
> >> This is not incorrect?
> >
> > Everything isn't correct ...
> >
> >> East accept the "incorrect" connection from west, because its IDs match
> >> its expected IDs. It then authenticates as "east" to west" which is
> >> misconfigured on purpose to expect "road" and it fails the connection.
> >>
> >> Now, the one thing that is wrong is that we should not delete #4 without
> >> sending a notify - we are supposed to send a DELETE notify with
> >> AUTHENTICATION_FAILED payload.
> >
> > Right, this is a long standing bug.
> >
> > (as an aside the above should be blaming state #3, and not #4, for all
> > the auth problems)
>
> Yes, and on top of it, it should just delete state #4. It has no more
> chance of ever becoming a valid IPsec SA.
>
> >> But the test case does change output a bit, and worse is that it is
> >> doing retransmits and keeps the whack longer than our test system waits.
> >> I added retransmit-timeout=10s to the "incorrect" conn, so it releases
> >> the whack sooner.
> >
> > It also announces that it is releasing whack, but doesn't :-(
>
> Are you sure? I saw you added --impair delete-on-retransmit but that
> only works for IKEv1 or for IKEv2 initiator but only if initiator has
> a reason to retransmit (and receiving IKE_AUTH reply is not a reason to
> send another identical IKE_AUTH request). This is why I added
> retransmit-timeout=10s to some ikev2 tests so they would fail before
> the test harnass times out.

Right for both IKEv1 and IKEv2, delete-on-retransmit disables the
ipsecdoi_replace(st, try) call and forces:

        * XXX There should not have been a child sa unless this was a timeout of
        * our CREATE_CHILD_SA request. But our code has moved from
parent to child
       delete_state(st)

where delete_state() does the exact opposite of what its name suggests
and spawns a replacement
(delete_state() is a terrible function name, I think we should rename
it into morph_state_into_zombie_army())

If the IKE SA was doing the re-transmits we'd sidestep this issue (but
likely create others).

> > The log suggests there are now two states trying to establish:
> > #5 replacing #2
> > #4 trying another keying attempt
> > which I suspect is worse.
>
> well, the child must die when we receive a bad IKE_AUTH so that cannot
> happen :0
>
> >> Do you think there is a code change that is needed? Because I'm not sure
> >> what would be needed.
> >
> > An alternative to setting the IKE SA to REPLACE, would be to make the
> > IKE SA responsible for the re-transmit.  In theory that means deleting
> > the line:
> >    md->st = cst; /* switch to child */
> > but who knows what will really happen.
>
> We shouldn't switch to the child state. Its doomed at this point.

It's happens in the initiator (in inR1outR2) while processing the
IKE_SA_INIT response and preparing the IKE_AUTH request.  It turns out
that the switch causes two things:
- the retransmit timer gets assigned to the CHILD and not the IKE SA
- the outgoing IKE_AUTH msgid gets assigned to the CHILD and not the IKE SA
It is then the latter that causes the IKE_AUTH response to be routed
to the child.