[Swan-dev] retransmit-interval and retransmit-timeout

Sun Jan 25 21:04:32 EET 2015

On Sat, 24 Jan 2015, D. Hugh Redelmeier wrote:

> Either these features are experimental or essentially forever.

Yes, but we do not know yet. Already I'm strnogly leading towards
non-experimental.

> As much as we can, we should hide useless complexity from the user.
> Hardcoding things that are reasonable and robust is not a mistake.

But we have very different use cases:

- inter-cloud instance communication with peers at < 10ms away
- VMs on the same iron at < 1ms away
- Hosts across the atlantic at 80ms away.
- mobile phones with chipsets that wake up taking 250ms

> | We did not think we had all the answers, and therefor wanted a little
> | flexibility in the new system. Right now we have it at 500ms, but we
> | really do plan to bring that down a lot before a release.
>
> I would love to see the experimental hypotheses written up.  And the
> testing protocol.

We know that we are going to cause delays installing crypto to
opportunistic connections. These applications are already shaving
of as much latency whereever they can, again look at the browser
people. We MUST play that game because our consumers are waiting on
us on top of their own delays.

Our sweet spot is "as aggressive as posible without being abusive".

> | And it was 20 seconds even. In fact, some iphones would abort within 20
> | seconds so any single packet loss would end up in failure before
> | retransmit.
>
> One second, which is the minimum RTO for TCP would suggest to me that
> 1 second ought to be a minimum RTO for IKE.  Or at least a good
> starting point.

Look at SPY and Google and Microsoft. No one is honouring those ancient
RFC values anymore. Those times are over. Waiting 1 second to recover
from packet loss is just not realistic if we are to be an accepted cost
on top of HTTP.

> | I've already found that some "hangs" I saw with pluto were in fact
> | packet loss on my DSL link. I now see retransmits on my client
> | while I see no duplicate packets on my server. This code has already
> | proven that I was suffering from packet loss without knowing.
>
> What would have helped you discover this?

I don't know. Looking back it should have been obvious. It could have
been measured. It was in the log files.

> | Because we currently do not believe we have the answer to all the
> | timings. And to have an emergency switch to make things lower if it
> | turns out to cause really big issues.
>
> How could it cause big issues?  (That question is not rhetorical.)

(note i meant slower not lower)

If we become very aggressive it could change packet ordering and other
things. Look at the recent discussion of the XAUTH/iphone-racoon bug
where two IKE packets got reordered and the client confused and gave up
altogether. When we reduce our timings we might run into more bugs in
other vendor's IKE stacks. For those cases, it would be good if we could
tell people to go back to a slower mode.

> | Because depending on your initial interval, 3x retransmit can be either
> | 30ms or 80 seconds. So waiting 3x is not a useful measure to users on
> | how long they might want to wait.
> |
> | > Summary: I'd like to see a stronger case for this extra interface
> | > complexity
> |
> | I hope to above clarifies it.
>
> It has the problem that the two settings are not synchronized.  To me,
> it makes sense to synchronize the "give up" with exactly when you were
> about to do a retransmission.  If the previous retransmit was just
> before you give up, that's silly.

I thought the current implementation checked this at "retransmit" time.
I didn't think it was an absolute scheduled event in the future. But I
could be wrong. Antony?

> 500ms seems really fast unless you think we have stupidly lossy
> networks.

You have more patience than most :P Part of why browsers don't want
DNSSEC is because of an additional one or two 25ms roundtrips.

> Doubling seems like a reasonable approach to scale-searching, but I
> don't think that's what we're doing.

That is what the libevent branch is doing. If the retransmit-interval is
50ms, we do 50, 100, 200, 400.

If retransmit-timeout is 5s, we could continue with 800, 1600 and at
that point we hit 6350 which is > 5s and so abort the exchange and retry
(until keyingtries= is reached)

>  On the other hand, blasting
> every 500ms feels like the wrong scale to me.  ("Feelings" aren't good
> engineering.  Experiments are a good idea.)

which is why we want it to be an option for now. We don't think 500ms is
"blasting" but if we are wrong than we have a value people can change.

Right now, the fact that we had these values hardcoded in the code is
why everyone is still on 20s and having broken connections!

> I think most retransmissions are due to interop problems (bad
> configs).  Retransmission policy doesn't matter much there.  Give-up
> policy does.

If we get authenticated, and can trust the misconfiguration answers,
there is no retransmissions happening. Every IKE packet gets a response.
Sure if this is IKE_INIT or IKE_AUTH then we cannot trust it and we
might linger and retransmit a few more times. So we might cause 10
packets where in the past we caused 1 packet. I don't think that is
abusive.

> Some are due to dead peers (including network partitions).
> Retransmission is a way of finding out when the peer is back.  Often
> it is better to start negotiation again rather than retrying the
> current message.

Not sure how that relates to our current discussion? Increased
retransmits for DPD could actually result in less false positives
during congestion and reduce tearing down and rebuilding an IKE
session, ultimately saving in packets (and downtime)

> We could measure previous RTTs from the same peer and set RTO based on
> that.  That doesn't work for the first message, the most critical.  So
> it probably isn't worth the bother.  Besides RTT can be affected by
> amount of crypto work in the particular message.

I'm sure we can extend the current fairly simplistic retry mechanisms
and take lessons from Bind or other software in this matter. But that's
really not in our scope now. It would makea good Google Summer of Code
project though!

> So you lived for a long time with a disease and you want something to
> cover it up :-)  Better to expose and fix it.

If you talk about getting my DSL fixed, this country's duopoly leavs me
with one choices between two evils. Exposing Bell and Rogers won't help
me :(

> 20 seconds is probably unreasonable.  It's quite a leap to .5 seconds
> (almost two orders of magnitude).  It would have been trivial to
> experiment with 1 second.

1 second and 0.5 seconds is too long! way too long!

Especially when we need to "fail clear", we cannot be holding up traffic
for two seconds.

Paul