<div dir="ltr"><div><div><div><div><div><div><div><div><div><div><div>I've been trying to get my head around pluto's retry / re-transmit logic.  I was expecting something like:<br><br>  start:<br>    delay = r_interval<br></div>    cap = monoadd(now(), r_timeout)<br></div></div></div></div>    schedule(delay, retransmit)<br><br></div>  retransmit:<br></div>    if now() >= cap<br></div>      return done</div><div>    // MAXIMUM_RETRANSMITS_PER_EXCHANGE?</div><div>    delay = delay + delay // double</div><div>    // can be more than cap but that is ok<br></div><div>    schedule(delay, retransmit)</div><br>but, as you can guess, that isn't what I found.  I'm going to push a change to somewhat abstract/simplify the re-transmit logic and greatly increase logging; here are my notes:<br></div><div><br></div><div>- because of a post-increment, the delay (r_interval * 2^^nr_retransmits) grows:</div><div><br></div><div>      r_interval, r_interval, r_interval*2, r_interval*4, ...</div><div><br></div><div>  I think this is a bug; it should be:<br></div><div><br></div><div><div>      r_interval, r_interval*2, r_interval*4, ...</div></div><div><br></div><div><br></div><div>- because the start time isn't saved, the code uses something like:</div><div><br></div><div>    if delay >= r_timeout</div><div><br></div><div>  to decide if r_timeout was exceeded; I'm guessing it was a good enough approximation<br></div><div><div><br></div><div><br></div><div>- pluto can also auto-reply receives a "duplicate"; and that is sometimes capped:</div><div><br></div><div>    - IKEv2 normally unlimited; and plays no part in in the retransmit code; but ...<br></div><div>    - IKEv2 invalid KE; limited to MAXIMUM_INVALID_KE_RETRANS 3, because it is also re-transmitting<br></div><div>    - IKEv1 limited to MAXIMUM_v1_ACCEPTED_DUPLICATES 2 which seems very low</div><div><br></div><div>  I suspect the IKEv1 case should be unlimited (like IKEv1) when re-transmits are not happening.  For instance when in MAIN_R1.<br></div><div><br></div><div><br></div></div><div>- re-transmits can be impaired; but instead of dealing with this in 'start' vis:<br></div><div><br></div><div>  start:<br>    cap = monoadd(now(), r_timeout)</div><div>    if (impaired)</div><div>      libreswan_log("IMPAIR: ...");<br></div><div>      schedule(r_timeout, retransmit)</div><div>    else ...<br></div><div><br></div><div>  it deals with it in the first re-transmit event at time r_interval, and only sends the log to whack(?!?) if the timer expires; I suspect this is because it was easier.<br></div></div><div>  Changing it to the above makes it more deterministic and usable as a way to really suppress re-transmits.</div><div><br></div><div><br><div>- since duplicate replies are counted as re-transmits they feed into the re-transmit delay calculation - r_interval * 2^^nr_retransmits - the effect is two fold:</div><div><br></div><div>    - future re-transmits are more spaced out</div><div><br></div><div>    - the total time is shortened (because of how the timeout test is performed) and can (I suspect) result in waiting for less than r_timeout?</div><div><br></div><div>  puzzling; I suspect the second effect is unintended; and can be fixed by computing timeout properly.<br></div><div><br></div><div>Andrew</div><div><br></div></div></div>