[Swan-dev] [LONG] calling orphan_holdpass() in delete_state() ?

Mon Jul 1 18:38:55 UTC 2019

On Fri, 28 Jun 2019, Paul Wouters wrote:

[ following up on myself, but please chime in. Looking for advise here ]

> So, now we know that orphan_holdpass() for a state actually modifies the
> bare_shunts table, lets have a look again. It makes little sense to call
> it in delete_state() because the state could be deleted because a new
> state is started when a new keyingtries= is started. It really only
> makes sense to have this call done un release_connection(), when we
> have given up (eg keyingtries=1).

While it makes logical sense this code would be in delete_connection()
or release_connection(), there is a problem. If the tunnel goes down
because it was idle and/or reached the end of its life, it should just
vanish (along with the IKE and IPsec state) without installing a bare
(failure) shunt. But if it was for an incomplete IKE/IPsec state, then
the negotiationshunt or failureshunt should be installed (depending on
if we hit the keyingtries maximum or not. And all of this information
is only available in the state. By the time we are releasing or deleting
the connection, all the states with all the information have already been
deleted.

So it looks like the current location in delete_state() will work
better. It uses:

         /* If we are failed OE initiator, make shunt bare */
         if (IS_IKE_SA(st) && c->newest_isakmp_sa == st->st_serialno
            && (c->policy & POLICY_OPPORTUNISTIC) &&
             (st->st_state->kind == STATE_PARENT_I1 ||
              st->st_state->kind == STATE_PARENT_I2)) {
                 ipsec_spi_t failure_shunt = shunt_policy_spi(c, FALSE /* failure_shunt */);
                 ipsec_spi_t nego_shunt = shunt_policy_spi(c, TRUE /* negotiation shunt */);

                 DBG(DBG_OPPO, DBG_log(
                         "OE: delete_state orphaning hold with failureshunt %s (negotiation shunt would have been %s)",
                         enum_short_name(&spi_names, failure_shunt),
                         enum_short_name(&spi_names, nego_shunt)));

                 if (!orphan_holdpass(c, &c->spd, c->spd.this.protocol, failure_shunt)) {
                         loglog(RC_LOG_SERIOUS, "orphan_holdpass() failure ignored");
                 }
         }

> The call in delete_state() is causing the old state to remove the dir
> out policy of an established tunnel of a newer state on the same
> connection. This can be seen in test case 
> certoe-16-rhbz1724200-halfopen-shunt
>
> Earlier today, I commited a fix for the bad call in delete_state():
>
> commit 24bffe6702e8a3a9cfc230c9bacc43fef9da204b
> Author: Paul Wouters <pwouters at redhat.com>
> Date:   Fri Jun 28 12:27:13 2019 -0400
>
>    IKEv2: OE connection timing out could accidentally overwrite tunnel 
> policy
>
> @@ -896,7 +896,8 @@ void delete_state(struct state *st)
>                 linux_audit_conn(st, LAK_PARENT_DESTROY);
>
>        /* If we are failed OE initiator, make shunt bare */
> -       if (IS_IKE_SA(st) && (c->policy & POLICY_OPPORTUNISTIC) &&
> +       if (IS_IKE_SA(st) && c->newest_isakmp_sa == st->st_serialno
> +          && (c->policy & POLICY_OPPORTUNISTIC) &&
>
> While this seems to fix some cases, it does not fix all of them, because
> I think fundamentally, a shunt belongs to a connection and not a state.

The case to try next would be a partial responder state replaced by an
initiator state, although I'm not yet convinced this can happen....

It does seem to fix all non-authnull cases. The authnull case is
different. If an initiating state is partial and a respondering state
comes in and completes, we hit "eroute already in use" because authnull
cannot replace instances of itself since there is no authentication. I
don't have a good idea how to handle this case. Possibly just deleting
the older partial state is the best we can do?

Paul