[Swan-dev] f28 and testing's(f22) abysmal results

Tue Sep 25 14:25:45 UTC 2018

Here's a simplistic breakdown of why
http://testing.libreswan.org/results/testing/ is having bad results:

electric fence +160:
from previous analysis when using f22 we know that that electric fence
makes crypto slower triggering timeouts; with f28 it seems to be a lot
worse

f28 on testing(f22) +30:
my local machine which runs f27 was consistently less than testing;
there seems to be more output noise such as lost pings
I suspect a second is that f22 host is not tuned for an f28 guest

f28 vs f22 +30:
annoying differences between the two OSs that we can now ignore

1 vs 2 boot threads +5:
I recently increased the number of boot threads being from one to two;
increasing the host's workload like this increases failures slightly

As for performance, it is dismal (remember, these numbers include
testing both good and wip tests):

f22 with 1 boot thread: 16hrs
f22 with 2 boot threads: 9hrs
f28 with 2 boot threads: 16hrs
f28 with 2 boot threads and electric fence: 17hrs

(Hugh, on IRC, indicated that his f28 host, which presumably only ran
the good tests was ~1hr slower.  So good+wip should be 1hrs slower?)

So what can be done?  Several changes to the framework (I assume we
don't want to disable electric fence) are:

- on the theory that the HOST's KVM is too old, upgrade testing to
something more recent, I thing that's been looked at

- strip stuff from the boot (I should grep the test logs to confirm
that this is where the slow down is)

- (speculation) reduce boot verbosity - the test runner has to wade
through all the boot messages

And several changes to the tests:

- add something like --impair delete-on-fail
So that in tests where the responder is expected to reply with an
error notification, the test can quickly be aborted.  Currently the
tests are allowed to timeout.

- continue to replace sleeps and 4-pings with things like one-ping or
wait-until-alive and wait-for-whack-trafficstatus
Unfortunately this is slow and tedious

- revisit how timeouts are handled in the testsuite
Up until now I've been using either a large timeout or --impair
suppress-retransmits, to stop the re-transmits where that isn't
relevant to the test.  With 160 more tests now having problems, an
alternative strategy might be needed:
- sanitize the standard re-transmit messages, and then for tests where
the re-transmits matter 'impair' the message so it isn't sanitized
- sanitize the retransmit impaired message so updating is easier

Andrew