[Swan-dev] how do you spell NAT Traversal options?

Thu Apr 17 09:47:12 EEST 2014

| From: Paul Wouters <paul at nohats.ca>

| On Wed, 16 Apr 2014, D. Hugh Redelmeier wrote:
| 
| > I'm working at making the user-visible options more consistent.
| 
| Can you tell me what you mean with user-visible?
| 
| Do you mean pluto arguments as well as configuration options?

Perhaps everything that is documented as an interface.

| > There should only be one spelling of something.
| >
| > I'm going to replace all underscores _ in names with minus -.  (Old
| > names will quietly work until we decide to pull the plug.)
| 
| So you're planning on adding a dozen new keywords?

More like seven or eight.  Better than switching to all-underscore:
that's way more keywords. Besides, using underscore isn't conventional
for flags in GNU/Linux.

That appears to be the only way to simplify:
- add better stuff
- mark old stuff as obsolete with a warning that it will go away
- announce change
- wait for users to have a chance to adjust [how long? A year?]
- delete old stuff

Can you change "leak_detective" to "leak-detective" now, before there
are any users?  I can do that if you want, but I think that you are
actually user 0.

| stand alone I think "nat-t" is more intuitive than "natt", but if you
| combine it, eg nat-t-ikport it becomes confusing natt-ike port is more
| useful, although pronouncing "natt" over the phone is tricky, and "nat"
| might be better for that.

Clearly combining is key.  So I think that the reasonable choices are
natt or nat.  We just have to pick one and stick with it.

"natt" seems to be a slightly more transparent name than "nat"
because it emphasizes that we're not doing NAT, we're traversing it.
Or how about NATS for "NAT Survival" :-)

Besides, when folks read "nat", they probably think they know what
we're talking about, and that might actually be wrong.  On the other
hand, if it would be right, then we should use "nat" to gain the
advantage of failiarity.

What do other ipsec implementations call this stuff?

| > libipsecconf/keywords.c uses "nat_ikeport" and everything else uses
| > "natikeport".  I imagine that this cannot work (but I don't really
| > understand the plumbing). It seems to me that "nat-ikeport" is a
| > better name.  But maybe the feature should just go away.
| 
| Please do not remove it. One of the "weak" points of IKE/IPsec is that
| it is very easy to block, because you just block UDP 500 and proto
| 50/51. With ESPinUDP we have port UDO 4500. It is possible right now
| to specify both the IKE port and the NATT IKE port to a different
| number, avoiding censorshop/filtering. While this is not a negotiated
| option, if both ends agree they can do this circumvention.
| 
| But we should indeed add a test case to test if this feature of moving
| both ports works as expected. I'll do that.
| 
| > The documentation suggests that if you use this, you might confuse the
| > kernel.  This should be explained.
| 
| I'm not sure what the documentation says, but the confusion relates to
| the "udp (4)500 holes" that might need to change to different ports.

OK, good explanation.  Sounds useful.  Could you put it in the man page?

Whose job is it to adjust the firewall?  That affects how this should
be documented.

| > ==> is natikeport being used?  Why?  Can I delete it?
| 
| nat_ikeport= seems to match the other configuration options. If
| everything goes to using "-", we should pick nat-ikeport=

Or natt-ikeport.

| > ==> Which spelling would you choose?  Why?

Still up for grabs.  I'd be interested in any other opinions.

| > keep_alive SEEMS to be the way to specify a global "delay for NAT-T
| > keep-alive packets".  A HORRIBLE name.  Can we rename it?  I don't
| > even understand the description in ipsec.conf(5).  Is it how long we
| > will wait for an incoming keep-alive, or how long we'll wait before
| > sending one?  Or something else?
| 
| The name comes from the RFCs:
| 	https://tools.ietf.org/html/rfc3948#section-2.3
| 	https://tools.ietf.org/html/rfc3947#section-3.2

I didn't immediately see a specification there for the name for the
rate at which keepalive packets must be sent.

I have no trouble with the feature or the packet being called
keepalive.  It is calling the inverse frequency "keep_alive" that
seems very wrong.

| Keepalives are only send by the endpoint behind NAT. The option is the
| maximum idle time for an IKE+IPsec SA combination (both go to UDP 4500
| of the remote peer). If that idle time is reached, we sent a keepalive
| packet to prevent the NAT router from timing out its NAT mapping.
|
| (earlier versions did not meassure idleness and depended on just the
|  timer to always send a keepalive - still the case for KLIPS as we
|  don't have the right calls to ask if the IPsec SA is idle in KLIPS)

So: KLIPS and Pluto implement these keep-alive packets (I'm inferring
this from the existence of options for both).  Why?  Surely either one
would be good enough.  Two is a recipe for confusion, I would guess.
Certainly I'm confused.

If the problem is that a different kernel implementation cannot do it,
surely that should be a problem for Libreswan software to figure out
from a uniform specification by the user.

| Note this used to be a global option (now ignored) and is now a per-conn
| option.

Yes, but I think that the inverse-frequency "keep_alive" is global.

It seems a little funny to make "keep_alive" (the inverse frequency)
global while making "nat_keepalive" per-conn.  ("keep_alive" is just
as much about nat as "nat_keepalive".)  ("keep_alive" is a
uselessly/annoyingly different spelling from "keepalive".)

The side behind the NAT is the one that does keep-alive, right?  I
imagine that it is a usually-global property that it is traversing a
NAT with a certain timeout.  One can imagine systems with different
NAT boxes for different IPSec SAs, but that seems improbable.  If it
happens, then it might be the case that the timeouts are different
too.

That's an unnecessary limitation of the current parser.  I think that
that could easily be fixed.

There is some advantage in keeping distinct names for things with
different meanings!

| So I guess such an option could be called nat-idle-send-keepalives= or
| something? Or perhaps than use nat-idle-max-time= and
| nat-idle-send-keepalives= ?

Something that is concise, suggests time and natt, at a minimum.  If
it weren't the inverse, I'd suggest natt-keepalive-frequency.

nat-timeout?  It infoms Libreswan what the NAT system's patience is.

Idea (kludge?): a dual purpose conn option: we enable
heartbeat^H^H^H^H^H^H^H^H^H keepalive if, and only if, nat-timeout
is set to some non-zero time.  Then we don't need a second option to
specify if we want keepalive.

The only trouble is that it forces the user to know what the timeout
is.  But does Libreswan have any reasonable way of knowing this
without the user telling us?

| Feel free to remove the obsolete options' documentation, though it would
| be nice to start a wiki page with a list of the obsoleted options and a
| pointer to the option it is replaced with.

Sure. The manpages now do this (mostly).  I just removed description
of things they no longer do and removed other options referencing
removed options.

| Which reminds me that we need
| to generate our man pages in html as well under libreswan.org/manual/

Yes.  And, of course, put them "live" on libreswan.org.

| > programs/pluto/whack.c usage:
| > --no-nat_keepalive
| 
| When we added a new global option that would change the default
| behaviour, we wanted that _not_ specifying the option
| would not change the default behaviour. So sometimes that mean adding the
| option as a negative, eg --no-nat_keepalive. This option is not
| --nat_keepalive to enable it because the default was "enabled".

I understand.  But I hope that there is a better approach (based on
particular cases).  And it's really good to try to keep the options in
different places as similar as possible.

====

Libreswan is too complicated on many levels.

One level: our configuration languages are more complicated than they
need to be.  This matters to less-exprienced users.  We experienced
ones don't even notice this -- we're used to it.

Simplicity is hard work.  As software ages, it gets layers of cruft
added.  I'm trying to fight that.

I admit to a certain amount of blindness.  For example, I think lset_t
is an easy-to-understand and clean abstraction.  I've heard that it is
easy only for the author.