[Swan] strncpy doesn't do what many people think that it does

Fri Feb 15 18:34:36 EET 2013

| From: Wes Hardaker <opensource at hardakers.net>

|     #   define netsnmp_assert(x)  assert( x )
|     #   define netsnmp_assert_or_return(x, y)  assert( x )
|     #   define netsnmp_assert_or_msgreturn(x, y, z)  assert( x )

| Thus, it does nothing in production mode or asserts when in developer
| mode.
| 
| Also not, that the other two functions should be the ones actually
| used (but the non-developer mode version is not shown here).  They do
| something better: assert when in developer mode for debugging, and
| return a value when not allowing up-code error recovery to do the right
| thing.  IMHO, this gives you the best of both worlds.
| 
| (and our release script has a grep for asserts to ensure we don't ever
| release code with a real assert in it)

Those choices don't make sense to me, although the goal does.

assert is clear, but you've already said that it is not what you want.

assert_or_return: why not just warn AND return (or more accurately:
apologize and return)?  If return is going to work, why crash?  (You
might want a separate abort-on-apology setting.)

How is assert_or_msgreturn conceptually different from
assert_or_return?  I would guess: the invocation supplies a more apt
message than just the predicate.  If so, it ought to be a variant of
apologize_and_return.

Aside: roughly speaking, OR is often harder to reason about and test
than AND since case analysis contributes to combinatorial explosion.

When I code asserts, I intend them to catch "can't happen" situation.
They are certainly not used to test for user errors.  The difficulty
with handling asserts in a more nuanced way is that they are catching
"unknown unknowns" (Rumsfeld got mocked for that concept, but it makes
sense).  How could you know what to do with them?  They are a form of
documentation and debugging and, yes, hardening.

I don't intend them to be lazy error handling for predictable
or understood situations.  I don't mean for them to check at trust 
boundaries.

I could see a refinement in the way Pluto uses asserts.  There could
be a *guess* at the damage that the failure suggests.  Then perhaps
only part of the system needs to be reset.  For example: a failure
might indicate that the connection we're working on is botched (in a
way not contemplated by the code) and we could try to kill/shed/ignore
this connection.  But that is pretty shaky ground.  The natural tool
for implementing this kind of thing is setjmp/longjmp; its bad
reputation is at least partially deserved.

In my experience with Pluto years ago, users hated it when asserts
would kill their daemons.  So much so that each bug was an emergency
for them and us.  We got them fixed quickly.

FreeS/WAN was changed to include a wrapper script that would restart
Pluto when it failed.  This meant a lot fewer panicked users.  It also
seems to have lead to fewer bug reports.  I suspect that Pluto bugs
lingered longer.

In years since I worked on the code, there have been almost no
crashers found in that code (I'm not talking about code subsequently
added).  It's not that I make few mistakes, its that the code detects
them and exposes them more quickly.

Quick question: do you like architectures that SEGFAULT on
dereferencing NULL, or ones that silently access memory at address 0?
I made hardware and OS mods to a computer I owned to move it to the
SEGFAULT class: that's how strong my preference is.  Assertions are
like that.