[Swan-dev] retry/re-transmit controls

Wed Nov 15 19:29:11 UTC 2017

I've been trying to get my head around pluto's retry / re-transmit logic.
I was expecting something like:

  start:
    delay = r_interval
    cap = monoadd(now(), r_timeout)
    schedule(delay, retransmit)

  retransmit:
    if now() >= cap
      return done
    // MAXIMUM_RETRANSMITS_PER_EXCHANGE?
    delay = delay + delay // double
    // can be more than cap but that is ok
    schedule(delay, retransmit)

but, as you can guess, that isn't what I found.  I'm going to push a change
to somewhat abstract/simplify the re-transmit logic and greatly increase
logging; here are my notes:

- because of a post-increment, the delay (r_interval * 2^^nr_retransmits)
grows:

      r_interval, r_interval, r_interval*2, r_interval*4, ...

  I think this is a bug; it should be:

      r_interval, r_interval*2, r_interval*4, ...

- because the start time isn't saved, the code uses something like:

    if delay >= r_timeout

  to decide if r_timeout was exceeded; I'm guessing it was a good enough
approximation

- pluto can also auto-reply receives a "duplicate"; and that is sometimes
capped:

    - IKEv2 normally unlimited; and plays no part in in the retransmit
code; but ...
    - IKEv2 invalid KE; limited to MAXIMUM_INVALID_KE_RETRANS 3, because it
is also re-transmitting
    - IKEv1 limited to MAXIMUM_v1_ACCEPTED_DUPLICATES 2 which seems very low

  I suspect the IKEv1 case should be unlimited (like IKEv1) when
re-transmits are not happening.  For instance when in MAIN_R1.

- re-transmits can be impaired; but instead of dealing with this in 'start'
vis:

  start:
    cap = monoadd(now(), r_timeout)
    if (impaired)
      libreswan_log("IMPAIR: ...");
      schedule(r_timeout, retransmit)
    else ...

  it deals with it in the first re-transmit event at time r_interval, and
only sends the log to whack(?!?) if the timer expires; I suspect this is
because it was easier.
  Changing it to the above makes it more deterministic and usable as a way
to really suppress re-transmits.

- since duplicate replies are counted as re-transmits they feed into the
re-transmit delay calculation - r_interval * 2^^nr_retransmits - the effect
is two fold:

    - future re-transmits are more spaced out

    - the total time is shortened (because of how the timeout test is
performed) and can (I suspect) result in waiting for less than r_timeout?

  puzzling; I suspect the second effect is unintended; and can be fixed by
computing timeout properly.

Andrew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.libreswan.org/pipermail/swan-dev/attachments/20171115/67ab4664/attachment.html>