[Swan-dev] [LONG] calling orphan_holdpass() in delete_state() ?

Paul Wouters paul at nohats.ca
Sat Jun 29 03:43:04 UTC 2019

I've been looking at why orphan_holdpass() is called in delete_state()

Since this seems more of a property of the connection than of a state.

The commit that moved it was mine:

commit d9512bb27c8526995182c4cf6981c9fad690d291
Author: Paul Wouters <pwouters at redhat.com>
Date:   Fri Oct 9 23:44:40 2015 -0400

     pluto: OE move orphan_holdpass() call into delete_state()

     We were missing cases where failures did not lead to a proper shunt.
     Instead of duplicating the code further, move it into delete_state()
     and if the state is a failed OE initiator state, call orphan_holdpass()
     to install the bare shunt.

It used to be in the retransmit_v2_msg() code (just before
delete_state() is called) but also in timer_event_cb() when it hit
the case for "un-established partial ISAKMP SA timeout"

It seems this did not take into account that a retrying connection will
delete a state.

I was confused about the purpose of the two lists we maintain:

1) orphaned_holds

This turns out to be acquires from the kernel we find in /proc. Happens
if the kernel couldn't give us all ACQUIRE messages, eg due to peak
load. This is KLIPS only and is not used for XFRM/NETKEY.

2) bare_shunts

A received acquire is turned into a connection-less (aka bare) shunt.
Then it is matched with a connection. If found, the connection's
negotiationshunt= is checked and if that is PASS, we install it because
the default is HOLD (but is it really, since if we properly eat up
the larvel XFRM state, it would be PASS).

Upon an OE IKE failure, we install the faileshunt (if different from
default). Then the OE group instance's instance can be deleted. That
is why this is a "bare shunt". This is different from regular
auto=ondemand connections. When those fail, the connection remains
and the shunt is not bare but belongs to the connection.

The other confusing thing is the kernel_ops->remove_orphaned_holds(),
which for NETKEY/XFRM is NULL. So if we run orphan_holdpass(), what
process would clean that up? Does it need cleaning up ? It turns out,
orphan_holdpass() does not actually modify 1) but it modifies 2). Just
a badly chosen name.

This is not to be confused with kernel_ops->scan_shunts() which is not
NULL and calls expire_bare_shunts(). This is used both for KLIPS and
XFRM/NETKEY to delete old failed OE shunts, so in a future they can be
tried again. It would remove the failureshunt (pass or drop) and allows
a new acquire to happen when it hits the installed trap policy.

>From the old freeswan docs:

 	When Pluto is notified by KLIPS of a packet that has been TRAPped,
 	there is no connection with which to associate the HOLD.  It is
 	temporarily held in the "bare_shunt table".  If Opportunism is
 	attempted but DNS doesn't provide Security Gateway information, Pluto
 	will replace the HOLD with a PASS shunt.  Since this PASS isn't
 	associated with a connection, it too will reside in the bare_shunt
 	table.  If the HOLD can be associated with a connection, it will be
 	removed from the bare_shunt table and represented in the connection.

 	There are two contexts for which shunt eroutes are installed by Pluto
 	for a particular connection.  The first context is with the prospect
 	of dealing with packets before any negotiation has been attempted.  I
 	call this context "prospective".  Currently is a TRAP shunt, used to
 	catch packets for initiate opportunistic negotiation.  In the future,
 	it might also be used to implement preordained PASS, DROP, or REJECT

 	The second context is after a failed negotiation.  I call this context
 	"failure".  At this point a different kind of shunt eroute is
 	appropriate.  Depending on policy, it could be PASS, DROP, or REJECT,
 	but it is unlikely to be TRAP.  The shunt eroute should have a
 	lifetime (this isn't yet implemented).  When the lifetime expires, the
 	failure shunt eroute should be replaced by the prospective shunt

I believe all of this text talks about the bare shunt table. Since
either there is no connection associated yet, or there is a connection
associated but it was an OE group instance' instance, and the connection
failed and we will delete it, and for the drop/pass to remain, it needs
to go into the bare_shunts table (and gets cleaned up after a while,
default 15 minutes but can be changed these days in config setup via the
shuntlifetime= value)

So then what is the orphaned_holds table? Freeswan claims:

 	/* The orphaned_holds table records %holds for which we
 	 * scan_proc_shunts found no representation of in any connection.
 	 * The corresponding ACQUIRE message might have been lost.

This is where pluto finds kernel state via /proc about unknown holds,
something that does not apply to XFRM/NETKEY. I have no idea what XFRM
does when there are too many acquires for it to send. Surely we don't
know where to find a potential backlog. Which is why we also do not
have a kernel_ops->remove_orphaned_holds() for XFRM.

So this leads me to think that orphan_holdpass() should not be called
at all for XFRM/netlink. Or at least, its name is misleading if it is
modifying the bare_shunts list and not the orphan_holds list. It was
introduced by me (likely with help of Hugh) in commit:

commit f03dd0f740c23b5aa20a62ff46788e24b15f541b
Author: Paul Wouters <paul at nohats.ca>
Date:   Wed Apr 29 22:16:21 2015 -0400

     WIP: proper OE pass routes without connection/state

It indeed does not modify the orphaned_holds list at all, and modifies
the bare_shunts list. So I guess it should be renamed for clarification.

Again, not to be confused with expire_bare_shunts() which we do run to
cleanup failed OE shunts so they can be retried in the future.

So, now we know that orphan_holdpass() for a state actually modifies the
bare_shunts table, lets have a look again. It makes little sense to call
it in delete_state() because the state could be deleted because a new
state is started when a new keyingtries= is started. It really only
makes sense to have this call done un release_connection(), when we
have given up (eg keyingtries=1).

The call in delete_state() is causing the old state to remove the dir
out policy of an established tunnel of a newer state on the same
connection. This can be seen in test case certoe-16-rhbz1724200-halfopen-shunt

Earlier today, I commited a fix for the bad call in delete_state():

commit 24bffe6702e8a3a9cfc230c9bacc43fef9da204b
Author: Paul Wouters <pwouters at redhat.com>
Date:   Fri Jun 28 12:27:13 2019 -0400

     IKEv2: OE connection timing out could accidentally overwrite tunnel policy

@@ -896,7 +896,8 @@ void delete_state(struct state *st)
                 linux_audit_conn(st, LAK_PARENT_DESTROY);

         /* If we are failed OE initiator, make shunt bare */
-       if (IS_IKE_SA(st) && (c->policy & POLICY_OPPORTUNISTIC) &&
+       if (IS_IKE_SA(st) && c->newest_isakmp_sa == st->st_serialno
+          && (c->policy & POLICY_OPPORTUNISTIC) &&

While this seems to fix some cases, it does not fix all of them, because
I think fundamentally, a shunt belongs to a connection and not a state.
So only a connection deletion should move the shunt from the connection
to the bare_shunts table. By removing the entire call in delete_state(),
the problem in test case certoe-16-rhbz1724200-halfopen-shunt does go
away properly for me. But it likely breaks other things because it was
needed before to install the proper negotiation/failure shunts. So I'm
still looking further into this before commiting the removal.

While testing, I found a related bug that happens when using AUTH_NULL:

     west OE initiates conn private[1] to east but east is not running pluto.
     west keeps trying (keyingtries=%forver)
     east is started, and triggers its own OE initiatialize to west
     west starts a new private instance, private[2]
     west succeeds in IKE negotiation but when trying to install the ipsec sa
          of private[2], finds the eroute is already in use by private[1]

This happens because we do not let AUTH_NULL connections replace each
other, because there is no authentication.


More information about the Swan-dev mailing list