[Swan-dev] testing, testing

Wed Jun 17 22:10:04 EEST 2015

On 16 June 2015 at 19:51, Antony Antony <antony at phenome.org> wrote:
> On Tue, Jun 16, 2015 at 04:38:59PM -0400, Andrew Cagney wrote:
>> I suspect that the algorithm is something like:
>
> may be small difference. directory is set outside the loop.
>
> runoutput = archive-dir/ + "%Y-%m-%d" + nodename +  "make showversion"
> for try in 1..5:
>    for test in tests:
>        if no runoutput or not archive passed
>            delete OUTPUT
>            run test
>            copy OUTPUT to runoutput/
>
> If this works as expected a date change should not cause re-run when running from a simple "make check"

here's the first bit of snipped code:

    tried = 0
    output_dir = ''
    # if there is TESTLIST run in batch mode.
    (output_dir, testlist, ran_tests) = do_test_list(
        args, start, tried, output_dir)
    if testlist:
        while (tried <= args.retry):
            (output_dir, testlist, ran_tests) = do_test_list(
                args, start, tried, output_dir)
            tried = 1 + tried

notice how "tried" is only updated after the second call to
do_test_list.  The latter function has:

    odir = output_dir
    if not tried:
        odir = setup_result_dir(args, True)
   ...
    return ..., odir, ...

so I suspect that the first two calls to do_test_list both compute output_dir.

To be honest, though, I've found swantest too hairy to contemplate
changing/fixing.

> the loop is to help errors seen on Hugh's server, specifically KVM errros.
> Do you still see those?  It might be worth getting some stats on it.

I don't see any; but then I'm not running stock swantest.

I find that when a test fails, it tends to keep failing.  Running it 5
times doesn't change that.
For what looks like an intermittent failures, I just re-run it later
with the machine rebooted.

>> So failing tests are always attempted multiple times.  It is just
>> that, when the time changes, the archive directory also changes
>> causing previously successful tests to also be re-attempted.
>>
>> Passing:
>>   --retry 0
>> to swantest might help.
>
> in the past, Fedora 20 and KVM guests on Hugh's server showed a lot reboot issues and the retry helped then and retry didn't harm me:) I don't know if it is still needed.

It seems things have improved.

>> I'm beginning to think that the best default behaviour might be to
>> only attempt tests with no OUTPUT directory. It means:
>
> why not read the json ? JSON files has more info it will note if failure is due KVM err.  If it is KVM and P9 error retrying proved to useful.
>
> for e.g I see 18 instances of of KVM errors over a year.
> find  /home/build/results/ -iname "RESULT" | xargs  grep KVMERROR |wc -l
> 18

> The last tiem I saw this KVMERROR was in Sept 2014. May be Fedora + swantest improvements fixed these issues.
>
> 2014-09-05-swantest.libreswan.fi-v3.10-52-g0a4ca86-hugh-2014aug/basic-pluto-04/OUTPUT/RESULT:{"node":"swantest.libreswan.fi","error":"KVMERROR not able to shutdown 1 guests: [north] abort","testname":"basic-pluto-04","epoch":1409929425.67,"result":"abort","time":"2014-09-05 18:03","runtime":65.25}
>
>
>> - tests are only attempted once (so intermittent failures are not hidden)
>> - re-running a test is easy - delete OUTPUT
>
> I feel on a dedicated machine, where it continously run "make checl" retrys won't harm.

Unfortunately on a local desktop, where we're more worried about
turnaround, it makes things slower than they need be :-(

> One mystery is why does Hugh's machines take so long. While 3 servers I run seems to finish "make check" in 5-7 hours.

While my tests take 6 hours on an i5, I:
- run tests once
- skip WIP tests
this constrains things to the point that they'll run overnight

> This pages show the data from Dec 2014. Jun - Dec my recollection is 5-6 hours.
> http://hal.phenome.nl:8081/results/
>
> -antony