[Swan-dev] oprofile of busy libreswan server

Wed Aug 30 18:54:56 UTC 2017

On Wed, 30 Aug 2017, Andrew Cagney wrote:

> The thing I hate about simplistic profiling is its lack of context, and how this leads to micro-optimizing; context would let us see why the
> calls are occurring and likely eliminate them.

I tried using gprof but I couldn't get it to run properly. I didn't seem
to get any symbols on the ubuntu system this ran on, despite the symbols
being there.

> vfprintf() and friends:
> 
> - presumably the vfprintf() calls are coming from libreswan_vloglog() and friends.  These call fprintf() and/or a syslog(3) (both likely call
> vfprintf(); don't get confused by syslog(2)).   While I've no direct evidence of this, sanitize_string() also showing up tends to support
> this theory.
> 
> While some fprintf() could be replaced with fwrite(), I'm not sure of the benefit.

Right.

> As an aside, does it include:
>   e77fd263fb7e13b8897441e8bd66af054ced8378 logging: change libreswan_log()/libreswan_loglog() into calls to new libreswan_vloglog()
> which would have helped reduce sprintf() calls.  Prior to that, everything was doing sprintf() calls all the way down.

I don't think so. You pushed that commit post-v3.21 right? We could
apply it and test though.

> - sanitize_string():
> 
> I still can't get my head around if/when it is needed, and I suspect I'm not alone - is there a diagram explaining what is going on?  It
> feels like we've solved the wrong problem.

One of those calls is in cisco_stringify() which really only happens
a few times during IKEv1 XAUTH. It cannot be the cause of so much CPU
usage. The only other call is in fmt_log() which is used all over the
logging functions. That is probably the one causing the issues, and
I do wonder if it is needed. None of the libreswan_log() or loglog()
or DBG_log() should treat network input streams inputs as string texts.
However its use goes back at least to the initial git import of openswan
2.0.0.

> - other print calls
> 
> Since we're doing lots of logging they would naturally bubble up.

Right although this was running a busy server without plutodebug. But
still that leaves a lot of libreswan_log() calls. We should probably
weed out a bunch of redundant ones.

> NSS:
> 
> - what is making such seemingly heavy use of SQLITE?  Is it NSS?

That is the only user of sqlite in our code AFAIK.

> - do we know our NSS workload?  For instance, the number of DH and/or other "heavy" operations we're doing.  If we know that we can work out
> the expected throughput; not being close would suggest we're instead making NSS jump through hoops.

We had some statistics on that. But it was mostly in in 0-1000 usec (not msec)

Paul