[Swan-dev] some results of our look at the current logging state of libreswan

Sun Jan 6 00:23:21 EET 2013

Some devs spent most of the day looking at the logging status, and I
thought my brain dump and hugh's comments might be worth sharing at
large. The logging parts clearly need a cleanup and simplification..

Paul

---------- Forwarded message ----------
Date: Sat, 5 Jan 2013 14:13:28
From: D. Hugh Redelmeier <hugh at mimosa.com>
Cc: Antony Antony <aa at phenome.org>
To: Paul Wouters <paul at nohats.ca>
Subject: Re: braindump

| From: Paul Wouters <paul at nohats.ca>

| Spend 4 hours with DHR on the phone..... another three going through
| the tree.

| Then on top, Avesh and I made errors in adding logging stuff
| to non pluto parts due to existing bad examples.

It's sometimes called technical debt.  Bad stuff catalyses more bad
stuff.  Complexity snowballs.  But to be honest, logging in Pluto was
fairly intricate to start with

| We found a new logging function every hour! And they're all kind of
| the same.
|
| loglog() named functions (like loglog() or libreswan_loglog() that
| take an int seq_no are only for use by pluto/whack. That is, if you
| whack something pluto returns output prefixed with "ftp like" numbers
| for easy parsing. NO ONE but pluto should use it. Unfortunately, some
| of pluto got refactored into lib/libswan/ and part of that code got
| re-used elsewhere, leaking in these functions. Avesh especially used
| these. (for a gem, look at the function showhostkey_log, before you git
| pull)
|
| cur_debugging is also only a concept of loglevel in pluto, and leaked
| out.
|

| the logging mechanism of pluto, to pick between stderr and syslog, only
| really makes sense for pluto as a daemon. It has no place in other
| programs that are called. why would ipsec showhostkey ever syslog?
| Therefor, the loglog() does not belong with those other parts.
|
| There are loads of typedefs that make things worse:
|
| #define loglog  libreswan_loglog
|
| Also one for plog.

plog doesn't appear in 2003 pluto.

| pluto itself is including lswlog.h in various places, which was a copy
| of the stuff in its own log.h, which was factored out in lib/libswan
| but then leaked back into pluto! Its totally broken and loops logging
| definitions around in creative ways to fix/workaround it.
|
| tool_init_log() and related functions seem to be to setup a scaffolding
| emulating some of pluto's logging but simplier. Eg not using syslog.
| Again, a result of bad refactoring.
|
| the reason showhostkey is so complicated is that somehow, prompting for
| passwords got mixed up in this logging craze as it uses some of these
| functions.
|
| It seems libreswan_log used to be called log() and so libreswan_loglog()
| is loglog().
|
| log() / libreswan_log() returned int, but loglog/libreswan_loglog
| returns void. Tracing this through kernel_ops and linux/net/linux/*
| code shared (but sometimes ifdef __KERNEL__'ed) gave no reason for
| the int - no one ever checks it. But it turned out at some point
| a pointer to a log function ends up being prinft() which returns int
| and is outside our control, so it got bled in blackwards.
|
| loglog() should be wiped from lib/libswan, and those functions if they
| need it will get it from linking against pluto. Except showhostkey :/
| That also explains the log_to_stderr and log_to_syslog variables in
| the library - but as dhr explained, these are one and the same, so
| setting them in pluto changes them in lib/libswan too, which is why
| the logging does work after we fixed the other functions to also run
| the log_to_file check as we did today.
|
| We wanted to rename the loglog functions into whack_log, but that
| already exists, and does something different!!

Here's the original intent (I'm looking at code from 2003):

whack_log: sends status updates from pluto to whack so that whack
could keep the user (the invoker of whack) informed of progress etc.
These messages were deemed not important enough to hit the global log.
Note: pluto could be dealing with multiple whack sessions at any time;
it has to keep track of which whack should be notified for particular
messages.

log: send a message to the global log (standard out or syslog
depending on -e flag).

loglog: log BOTH to whack and the global log.  This name is pathetic.
We should think up a succinct and clear word for global logging and
a distinct one for logging directed to whack.

Looking at Pluto's log.h (i.e. the public face of log.c), I am
reminded how intricate it was.

- logging was affected (contextualized) by cur_* variables
   + they had to be carefully managed
   + they affected logging levels
   + they affected message prefixes (including context in messages)
   + they selected which whack process would be informed
   + see section labeled "Context for logging" in log.h (in the 2003
     version

The whole context idea is pluto-specific and has no place in a
library.

All whack-related stuff is pluto specific and has no place in a
library.

BUT: the secrets parsing code was part of pluto and it did need to
report errors and those diagnostics needed to be shipped back to
whack.  And show-secrets now uses the secrets parsing code so that
code needs to be made a library.  The caller must give the secrets
parsing code a way of issuing diagnostics.  Simple hack: show-secrets
must define its own loglog() that simply discards the first argument and
then acts like log().

I like the secrets parsing code and think that it could be a model for
improved config-file processing.  That would require a
better-thought-out way of modularising error reporting.

| the reason for the XXX_desired variables was the assumption by DHR that
| while you might WANT to log to syslog, you are still setting up things
| so you cannot log YET to syslog.

I was wrong.

| So the desire to do syslog is log_to_syslog_desired, and once you are
| ready for that, you set log_to_syslog. So the _desired functions should
| be within main() and not global as they were

In 2003 there were only two *_desired variables, both local to main():
fork_desired and log_to_stderr_desired.  I'll only talk about
log_to_stderr_desired.

The need for log_to_stderr_desired is subtle and somewhat escapes me.
During parsing of flags, messages get logged to both stderr and
syslog.  After flag handling, if the -e flag had been found, logging
to syslog was stopped; if not, logging to stderr stopped.

If we got rid of log_to_stderr_desired, we could default to logging to
syslog, and switch to stderr if -e were found.  Simpler.  But that
would mean that diagnostics for things before the -e was found would
go to syslog and not stderr, possibly confusing the user.  This seems
not very important.  But perhaps worth the slight cost of the
log_to_stderr_desired mechanism.

| now I'm totally dead and need sleep.
|
| Dhr: Thanks for the long talk!