[Swan-dev] testing: drifting reference logs

Tue Dec 30 08:47:01 EET 2014

The test reference logs need to be kept up to date.  They need to be
in sync with the source code.  But they need to be correct: don't make
changes that you don't understand.

False failures are a significant waste of my time and I'm sure they
are a waste of others' time.  (False success is evil.)

When a false failure happens, I have to paw through the voluminous
output to figure out what is going on.  I'm unlikely to understand the
consequences of someone elses changes (heck, I have enough trouble
understanding the consequences of my own).

Did my change cause the failure?

If my change did cause problems, how can I disentangle the symptoms of
my change from the symptoms of other peoples' changes?

That being said, it is wrong to assume that any change is a good
change.  Only update the reference logs when every change is clearly as
you intended.

I know that it is easy to create an inadvertant but still correct
change in the logs.  When you discover you've done that, please fix
the reference logs.

| From: Antony Antony <antony at phenome.org>
| Subject: Re: [Swan-dev] testing: more results, where 169.x.x.x is removed
| 
| On Mon, Dec 29, 2014 at 01:56:44AM -0500, D. Hugh Redelmeier wrote:

| > | New commits:
| > | commit 11749924b5a37c3b0fd6284a7c98cf9caed98212
| > | Author: Antony Antony <antony at phenome.org>
| > | Date:   Sun Dec 28 14:54:42 2014 +0100
| > | 
| > |     testing: more results, where 169.x.x.x is removed
| > 
| > Thanks.
| > 
| > Should all occurrences of 169.254.x.x be removed from the reference 
| > output?
| 
| IMHO all tests that pass should be updated; the failed ones I am not 
| sure. For now I only update tests that pass for me. And no output for 
| wip tests.

Why the heck would you want to keep "junk differences" in the logs?

I'm not saying: adopt the log of a failed test as the reference log.
That would be stupid and horrible.  I'm saying: do surgery on the
reference logs that matches your new expectations.

If you know that all references to 169.254.x.x are now bogus, get rid
of them (I don't think that all are).  That does not cause tests that
should fail to suddenly succeed.  It gets rid of distracting and
confusing chaff.

| > This command
| > 	egrep -n '\<169\.254\.' testing/pluto/*/*.txt
| > finds over 300 remaining.

	egrep -n '\<169\.254\.' testing/pluto/*/OUTPUT/*.txt

finds 44 lines in 16 files.  So I presume not all are obsolete.

| If the test pass for me, then I update. My policy is don't update output 
| with grep

I usually update with sed -i.  This works well even though it seems
risky: we have git as a safety net.

| .... for a failed test. 

I think that updating by editing is a very good technique.  It makes
sure that only intended changes get made.  If you adopt the output of
a run, you must be very careful that unintended changes accidentally
tag along with the intended change.

| If you are updating a failed test 
| there is a chance that the update may go wrong.

Of course it could go wrong.  That's why you have to test all your
changes, even changes to the reference logs.

With your approach, a failing test's reference logs get farther and
farther behind.  This makes it very difficult to tell what differences
are essential to the failure and what is accumulated from unrelated
changes.

| And also the git log of 
| failed test's can be confusing, these automagic updates may confuse you.

Not half as confusing as the spurious failures that I have to
repeatedly track down.

In what way is it confusing to have updates in the git?

My favourite way of committing reference log changes is as part of the
commit of the code that caused the changes.  That way the connection
is as clear as possible.

| egrep -l  '\<169\.254\.' testing/pluto/*/east*.txt |wc -l 
| 44
| 
| I think there are ~44 tests and 300 lines.

That means that there are maybe 44-16 = 28 tests with spurious
differences.  What a waste of my time.

| Some test cases are missing 
| from the file TESTLIST, e.g. x509-crl-05. 'make check' do not run those. 

That brings up another point.  Should they be deleted?  Fixed?  Marked
as WIP?

But if we keep them, we certainly should be updating them to reflect
systematic changes that we know about.

| So they are not updated. Some are in TESTLIST but pass only pass on some 
| machines:) And some need different Makefile.inc.local . I hope to fix 
| the last one soon with Docker tests.
| 
| Should we update the failed tests? 

YES!!

| From: Antony Antony <antony at phenome.org>
| Subject: Re: [Swan-dev] ikev2-06-6msg failure
| 
| > -002 "westnet-eastnet-ikev2" #1: Received anti-DDOS COOKIE -resending I1 with cookie payload
|
| It got removed in commit ca99295e. I think  that message was useful.

Thanks for the pointer.

| From: Paul Wouters <paul at nohats.ca>
| Subject: Re: [Swan-dev] ikev2-06-6msg failure
| 
| >> -002 "westnet-eastnet-ikev2" #1: Received anti-DDOS COOKIE -resending I1 with cookie payload
| 
| It was changed to debug only because of ddos concern

So please update the reference logs.

| Sent from my iPhone

I hate top-posting.  Get a better phone :-)

On the other hand, I like the quick response this phone enables.

| From: Paul Wouters <paul at nohats.ca>
| Subject: Re: [Swan-dev] "ipsec whack --shutdown" at end of test
| 
| Shutdown is needed in some tests! Don't extrapolate from the one test where I removed it.

You did it to multiple tests.  I did not say you removed them all or
that you said all were redundant.

I was unwilling to extrapolate; that's why I posted: hoping you would
fix the logs that you broke (if, in fact, that is what happened).

I went to some trouble to catalogue the cases.

Please do fix the reference logs.  But only if you are confident that
each change is a fix.