[Swan-dev] why do we bother checking out_raw() et.al.'s result?

Sun Sep 23 16:38:20 UTC 2018

On Sat, 22 Sep 2018 at 14:19, D. Hugh Redelmeier <hugh at mimosa.com> wrote:
>
> | From: Andrew Cagney <andrew.cagney at gmail.com>
>
> | > There are no good exception mechnisms.
> |
> | There is, its called abort().  Part of libreswan's start up is dealing
> | with any mess from a previous abort.
>
> Abort kills all operations.  I have no trouble advocating abort/assert
> when we go into unknown territory.  When an invariant turns out not to
> be true.

Such as the buffer overflowed?

> Packet output problems are not uncontemplated.  They are a kind of
> resource exhaustion.  (Although the last time I hit one was actually
> due to a bug in the calling code (that I had just writtten); an
> assertion failure would have been slightly more convenient to debug.)
>
> The packet output problems are currently handled in a way that stops a
> particular negotiation.  Nothing else is affected.

I'm really not sure what you mean by 'stops a particular negotiation'
means.  For instance if an IKEv2 AUTH response comes back with a fail,
we migh stop the negotiation, re-start the negotiation, or do some
funky negotation (a notify exchange to delete things).

> | > In security code, it is good to stop when things are going wrong.  Not
> | > get further into the weeds.
> |
> | The real enemy of security is unnecessary complexity.
>
> Yes.

> | So how is this:
> |
> |         return ikev1_out_generic(np, &isakmp_keyex_desc, outs, &z) &&
> |             out_zero(g->len, &z, "fake g^x") &&
> |             (close_output_pbs(&z), TRUE);
> |
> | which goes hand-in-hand with code trying to propagate 'fatal' up the
> | stack is somehow less weedy than:
> |
> |        ikev1_out_generic(np, &isakmp_keyex_desc, outs, &z);
> |        out_zero(g->len, &z, "fake g^x");
> |        close_output_pbs(&z);
> |
> | After all, if it really is fatal, abort().
>
> It's not fatal.
>
> For example, an adversary might be able to drive us into such an
> overflow.  The way things are now, only the negotiation he's cast his
> evil spell upon will fail.  No general DOS.

right, #2 below

> | I made these points:
> |
> | - calling an out function after a fail will core dump
> | hopefully that isn't true - that code path needs to be robust
> | regardless (I'm sure if I look I can find more code like the above
> | where a status is ignored)
>
> Probably.  Or an error can be lost (not buffer full errors, but
> others).

Such as, can you be specific?  If we can't enumerate potential
failures when emitting a payload then I suspect we've a problem.

Perhaps out_struct() can screw up?  Regardless of this thread, we
should ensure that the first error is sticky.  Something that can be
tested (even if all a re-call does is trigger an abort()?).

A case where care does need to be taken is in the packet-encrypt code.
It should probably check that the un-encrypted packet built ok before
trying to encrypt it.  Since IKEv[12] each only have one encryption
code paths, checking this should be easy,  Oh! wait ...

> | - pluto must never knowingly send out a corrupt packet (reflection
> | attack and all that)
> | (but I want to :-) so drop the packet as it goes out the door
>
> During state transition functions, we modify the internal state of our
> system (XFRMs, for example).  Quiting an STF as soon as an error is
> discovered prevents such changes, that either matters or it doesn't.
> I don't wish to audit the code to determine which.

I don't want to be the bearer of bad news, but for IKEv2 I _know_ we
screw this up.   The STF_* model fails.

For instance, during an IKEv2 AUTH the initiator may find itself
wanting to: accept the packet, transition the IKE state to
"established", delete the child (it shouldn't exist), and starting the
process of cleanly deleting the parent (which means sending a notify).
STF_* can't describe all that.  Nor do I think it should.

> | - the state machine will get confused because a packet was sent when it wasn't
> | First, this is no different to a MITM attack (where packets get
> | corrupted/dropped without our knowledge) - we get to deal.
>
> Normally: "deal" means retransmit the packet.  But, oops, we don't
> have one.

But we do, we just don't send it.

> | Second, if the SA should be killed, then inject a KILL-SA event rather
> | than trying to handle it in-line.
>
> I don't know what this means.  It doesn't sound conservative.

See above for an AUTH exchange.  If the initiator rejects auth it
needs to transition to initiate a delete, most straight forward way of
doing that is injecting a delete parent event to trigger the process.