[Swan-dev] DPD code mystery

Paul Wouters paul at nohats.ca
Mon Oct 9 03:20:39 UTC 2017


On Sun, 8 Oct 2017, D. Hugh Redelmeier wrote:

> Can someone who understands DPD have a look at this?
>
> in complete_v1_state_transition:
>
> 		/*
> 		 * make sure that a DPD event gets created for a new phase 1
> 		 * SA.
> 		 */

It's weird that this is put on a phase1 to begin with. DPD's are
supposed to only be on phase2's, and should only fire if there is
no incoming traffic received on that IPsec SA for a while.

So the comment that we should create a DPD for a "new phase 1" is wrong.

> 		if (IS_ISAKMP_SA_ESTABLISHED(st->st_state)) {
> 			if (deltasecs(st->st_connection->dpd_delay) > 0 &&
> 			    deltasecs(st->st_connection->dpd_timeout) > 0) {

So this should not be done as far as I can tell. It's wrong. It would
also cause DPD probes to go even before we have a phase 2.

> 				/* don't ignore failure */
> 				/* ??? in fact, we do ignore this:
> 				 * result is NEVER used
> 				 * (clang 3.4 noticed this)
> 				 */
> 				stf_status s = dpd_init(st);

There is more weirdness here. The function is passed state st, which at
this point we KNOW is a phase1 because of IS_ISAKMP_SA_ESTABLISHED()

This function then starts out with:


stf_status dpd_init(struct state *st)
        p1st = find_state_ikev1(st->st_icookie, st->st_rcookie, 0);

         if (p1st == NULL) {
                 loglog(RC_LOG_SERIOUS, "could not find phase 1 state for DPD");

So not only does it do an expensive lookup to find a state that was
passed in already, it seems to actually be designed to run a DPD for
phase2's, which make more sense.

Note the only known non-STF_OK this function gives, is if it did not
find the phase1 state of the passed in state, so if we pass in a phase1,
at least we know for sure that never happens (but that's more an
artifact of the function being called at the wrong time for the wrong
reason).

I'm wondering if what was meant wasn't:

                if (IS_IPSEC_SA_ESTABLISHED(st->st_state)) {

But I don't think it is needed because in ikev1_quick.c we see
the actual dpd_init() calls on the phase2 states, and those
do check the return code. So it looks all of this code should just
be removed.

checking further, if the DPD event hits, there is code handling
DPD for phase1 and phase2, oddly enough:

        if (IS_PHASE1(st->st_state) || IS_PHASE15(st->st_state ))
                 p1_dpd_outI1(st);
         else
                 p2_dpd_outI1(st);

The phase1 version just runs dpd_outI()

The phase2 version runs find_phase1_state() to find the phase1
(hugely expensive since if should use st_clonedfrom) and if
it does not find a phase1, creates one, and otherwise calls dpd_outI()
on the phase1 state, although the second argument there is the phase2
state. The initial work for checking/sending DPD happens on this
second argument, the phase 2. Then there is a check to see if this
was a "phase1 DPD" (st != p1st). If so, it won't event_schedule()
And then it sends a DPD packet and schedules a timer on the phase1.

So the DPD event created for the phase1 before we have a phase2, will
only ever fire once, and then no longer be scheduled. Only the phase2
installed ones survive this function call. Another reason to just
remove that code.

So this code is surely a bit odd. I think it is safe to delete the
code you find to launch DPD events if there is only a phase1. But
since DPD events are triggered by phase2 idleness but are scheduled
on the phase1, I'm not so sure if we can cut out more things. That
needs some extra thoughts.

Note, I guess someone thought changing result within the switch would
maybe cause it to end up in another switch case ?

So, I would say we should start with this patch:

diff --git a/programs/pluto/ikev1.c b/programs/pluto/ikev1.c
index fbf13fcff..f37cda315 100644
--- a/programs/pluto/ikev1.c
+++ b/programs/pluto/ikev1.c
@@ -2493,27 +2493,6 @@ void complete_v1_state_transition(struct msg_digest **mdp, stf_status result)
  			       sadetails);
  		}

-		/*
-		 * make sure that a DPD event gets created for a new phase 1
-		 * SA.
-		 */
-		if (IS_ISAKMP_SA_ESTABLISHED(st->st_state)) {
-			if (deltasecs(st->st_connection->dpd_delay) > 0 &&
-			    deltasecs(st->st_connection->dpd_timeout) > 0) {
-				/* don't ignore failure */
-				/* ??? in fact, we do ignore this:
-				 * result is NEVER used
-				 * (clang 3.4 noticed this)
-				 */
-				stf_status s = dpd_init(st);
-
-				pexpect(s != STF_FAIL);
-				if (s == STF_FAIL)
-					result = STF_FAIL; /* ??? fall through !?! */
-				/* ??? result not subsequently used */
-			}
-		}
-
  		/* Special case for XAUTH server */
  		if (st->st_connection->spd.this.xauth_server) {
  			if (st->st_oakley.doing_xauth &&



And at a later time, figure out how to get rid of the p1_dpd_outI1() / p2_dpd_outI1()
split and fix it up properly so phase2 triggers DPD events on the
phase1 (or cause to create a new phase1, as current code does)

Paul


More information about the Swan-dev mailing list