[Swan-dev] Leaks when killing states during crypto; time to drop WIRE_*?

Tue Dec 5 17:29:36 UTC 2017

| From: Andrew Cagney <andrew.cagney at gmail.com>

| First, here's my rewritten version of history:

Seems pretty good.

To get pointers (and many other things) right, one needs an iron
discipline.  This is best done by a set of simple-to-follow rules.

The wire stuff was such a simple set of rules.  Fairly reasonable when
there was a virtual wire.

Now, shared variables make a lot of sense.  Threads with shared
variables should be cheaper than processes with pipes.  But many
thread disciplines are really awkward.

Currently we have a hybrid solution.  We use pipes as an inter-thread
communications method.  This is probably silly.

- what are the actors?  The main thread and worker threads, but with
  roles that might be changeable.

- for each variable, who (what actor) owns it?
  - who allocates it; who frees it
  - who is allowed to write it?
  - who is allowed to read it?
  - what happens when an actor fails?

- how is work distributed?  How are results gathered?

| The problem was, when pluto is overloaded it will kill states
| mid-crypto and there is no code to clean up these pointers (if the
| code is there it isn't obvious)

In the framework I just put forward, the problem here is that an actor
appears to have lost competence and someone has to:

- make sure that the actor has stopped doing anything (observable).
  That may not be easy in the face of asynchrony.

- inherit all the actor's obligations

This means that those obligations must be represented in a non-opaque
way, one that must be shared or transferred between threads.  Yuck.

| So here's my solution:
| 
| Accept that pointers are being passed and make it work:
| 
| - try to apply the dogma that state and workers share no pointers
| (currently MD violates this) so there is no question as to who is
| responsible for releasing stuff

I suspect that the MD need only be owned by one actor at a time.

Clearly the main thread needs the MD most of the time.  But probably
not during "suspension" of a state.  That's when the worker could have
ownership.  I'm guessing that the only worker-access needed is for
encryption/authentication of the packet itself.

| - handle cleaning up after an abort with a separate callback, and run
| this from the main thread

The original design of the continuation mechanism is that failure and
success took the same path.  This seems surprising but it really cut
down combinatorial explosion.

It would be good if abort (death or assassination) used the same path.
Then it is always the continuation that manages resources.

When I added the continuation mechanism to pluto almost 20 years ago,
the idea of continuation was not mainstream.  It has been hard for
programmers to get their head around.  But in 2017 it ought to be in
every programmer's toolkit.  Of course C doesn't help at all.

We really need to cultivate simplicity.  One major part of that is
cutting down code paths.  Especially untested ones.  Error-handling is
generally the least exercised.

The continuation mechanism could be replaced.  The obvious way would
be to add substates (really: finer-grained states) to the state
mechanism.  Currently state transitions are triggered by incoming
packets.  Except for the anomaly of the initiator's first packet,
every State Transition Function is invoked for an input packet aimed
at that state.  Timeouts and whack commands can have effects on states
too but don't invoke STFs.  We could add triggering for worker
completion.  Everything currently held in any continuation would have
to be stuck in struct state or reachable from it.  I'm guessing that
the result might be easier to understand but it would be a lot of
work.