[Swan-dev] Leaks when killing states during crypto; time to drop WIRE_*?
andrew.cagney at gmail.com
Thu Dec 7 04:01:36 UTC 2017
On 6 December 2017 at 18:12, D. Hugh Redelmeier <hugh at mimosa.com> wrote:
> | From: Andrew Cagney <andrew.cagney at gmail.com>
> | > Clearly the main thread needs the MD most of the time. But probably
> | > not during "suspension" of a state. That's when the worker could have
> | > ownership. I'm guessing that the only worker-access needed is for
> | > encryption/authentication of the packet itself.
> | Yes, the state transition code uses MD (as a parameter). It doesn't
> | need to be stuffing it in st_suspended_md or conversely having state
> | stuffed into md.st.
> Historically, md has a pointer to the current struct state. I know
> you've changed this, but I'm not sure why that's a win.
> The idea was that md would carry all that was needed by a State
> Transition Function.
That's what I suspected - borrow object for A to also do B. Better to
create object for B.
> It is true that the rest of the msg_digest was the sliced and diced
> packet and that the state object is different.
Right. So why mix it.
> The state pointer in md was kind of cheap: it did not "own" the state,
> so that freeing the md did not require freeing the state.
Someone new to this code should be allowed the reasonable expectation
that a function operating on state, takes state as the [first]
parameter. Hiding it in MD is confusing; not cheap.
> On the other hand, if the state were freed (killed), the md needed to
> be deleted because nothing else "owned" the md.
Or ST isn't there at all and the need for this goes away.
As an exercise, pick a random crypto continuation function, and then
change its function signature to:
typedef void crypto_req_cont_func(struct state *st, struct msg_digest *md,
struct pluto_crypto_req *);
and then, assuming ST!=NULL and correct, and assuming the caller
sets/resets the context correctly, see how much accumulated guff
and/or sillyness such as:
passert(ke->pcrc_serialno == st->st_serialno); /* transitional */
passert(st != NULL); /* too late! */
passert(st->st_suspended_md == ke->pcrc_md);
can be removed
> If a state transition were suspended, the md (if any) needs to be
> preserved and remembered: for the restarted the state transition and
> for not leaking (kind of related). The md is preserved by nulling out a
> pointer to it in the framework so that the framework won't free it at
> the end of the (incomplete) state transition. It is remembered by
> being placed in st_suspended_md.
(I wish our memory manager referenced counted)
It is always in .pcrc_md - this is because it gets passed into new_pcrc()
I thought it was also always redundantly in .st_suspended_md; but then
I came across the comment:
* If there was a packet received while we were calculating, then
* process it now.
* Otherwise, the result awaits the packet.
> So it a state object is killed, it is easy to find any md attached to
> it by suspension. The maintenance of the md's pointer to the state
> falls easily out of that.
Or delete it and delete the need to maintain this.
> You sped up serial number to state mapping. No
> longer a linear search. That mechanism wasn't intended to be used
> much. But: are the uses that caused performance issues really
> reasonable? I don't know.
In the profile I saw it was "hot", yet, to your point, it shouldn't
have even appeared in the profile. There was clearly a lot of states.
It also means that saving a state as so_serial_t and then later trying
to turn it into a state is cheap; and much simpler, I think, than
trying to chase down stray state pointers.
> Scanning all existing states is linear. There seem to be a fair
> number of casess where that is used. Can that be sped up by
> datastructuctures that let us only look at the relevant states?
If it isn't in the path used when creating a connection it doesn't matter.
For instance, deleting a connection and all states is expensive; but not common.
> Performance measurement showed that sanitizing the log messages took a
> non-trivial amount of time. You ditched the sanitization. It
> probably wasn't necessary, but it was part of defense in depth.
> Alternatively, we could have found out why it was so slow. My guess:
> something to do with locales, something that should have been easy to
> cut out of the processing. If sanitizing the log messages took
> noticeable time, we were doing too much logging -- something expensive
> in itself.
- needless locking
- redundant memory moves
> | However, one thing that I do think should go is that extra code path
> | handling STF_INLINE. When there are no threads, use the event loop to
> | offload the work.
> I don't remember that clearly, but I think that's exactly what
> STF_INLINE is actually doing.
No. It calls the code inline i.e., directly; so another code path
(with different bugs - for instance cur-state isn't correctly
Anyway I think I've hacked things to the point where I can easily add
more work to the crypto pool.
PS: And I suspect I found another leak, the continuation object when STF_FAIL
More information about the Swan-dev