[Swan-dev] oprofile of busy libreswan server
paul at nohats.ca
Wed Aug 30 18:54:56 UTC 2017
On Wed, 30 Aug 2017, Andrew Cagney wrote:
> The thing I hate about simplistic profiling is its lack of context, and how this leads to micro-optimizing; context would let us see why the
> calls are occurring and likely eliminate them.
I tried using gprof but I couldn't get it to run properly. I didn't seem
to get any symbols on the ubuntu system this ran on, despite the symbols
> vfprintf() and friends:
> - presumably the vfprintf() calls are coming from libreswan_vloglog() and friends. These call fprintf() and/or a syslog(3) (both likely call
> vfprintf(); don't get confused by syslog(2)). While I've no direct evidence of this, sanitize_string() also showing up tends to support
> this theory.
> While some fprintf() could be replaced with fwrite(), I'm not sure of the benefit.
> As an aside, does it include:
> e77fd263fb7e13b8897441e8bd66af054ced8378 logging: change libreswan_log()/libreswan_loglog() into calls to new libreswan_vloglog()
> which would have helped reduce sprintf() calls. Prior to that, everything was doing sprintf() calls all the way down.
I don't think so. You pushed that commit post-v3.21 right? We could
apply it and test though.
> - sanitize_string():
> I still can't get my head around if/when it is needed, and I suspect I'm not alone - is there a diagram explaining what is going on? It
> feels like we've solved the wrong problem.
One of those calls is in cisco_stringify() which really only happens
a few times during IKEv1 XAUTH. It cannot be the cause of so much CPU
usage. The only other call is in fmt_log() which is used all over the
logging functions. That is probably the one causing the issues, and
I do wonder if it is needed. None of the libreswan_log() or loglog()
or DBG_log() should treat network input streams inputs as string texts.
However its use goes back at least to the initial git import of openswan
> - other print calls
> Since we're doing lots of logging they would naturally bubble up.
Right although this was running a busy server without plutodebug. But
still that leaves a lot of libreswan_log() calls. We should probably
weed out a bunch of redundant ones.
> - what is making such seemingly heavy use of SQLITE? Is it NSS?
That is the only user of sqlite in our code AFAIK.
> - do we know our NSS workload? For instance, the number of DH and/or other "heavy" operations we're doing. If we know that we can work out
> the expected throughput; not being close would suggest we're instead making NSS jump through hoops.
We had some statistics on that. But it was mostly in in 0-1000 usec (not msec)
More information about the Swan-dev