[Swan-dev] testing, testing

Antony Antony antony at phenome.org
Wed Jun 17 02:51:08 EEST 2015


On Tue, Jun 16, 2015 at 04:38:59PM -0400, Andrew Cagney wrote:
> I suspect that the algorithm is something like:

may be small difference. directory is set outside the loop.

runoutput = archive-dir/ + "%Y-%m-%d" + nodename +  "make showversion"
for try in 1..5:
   for test in tests:
       if no runoutput or not archive passed
           delete OUTPUT
           run test
           copy OUTPUT to runoutput/ 

If this works as expected a date change should not cause re-run when running from a simple "make check"

the loop is to help errors seen on Hugh's server, specifically KVM errros.
Do you still see those?  It might be worth getting some stats on it. 

> So failing tests are always attempted multiple times.  It is just
> that, when the time changes, the archive directory also changes
> causing previously successful tests to also be re-attempted.
> 
> Passing:
>   --retry 0
> to swantest might help.

in the past, Fedora 20 and KVM guests on Hugh's server showed a lot reboot issues and the retry helped then and retry didn't harm me:) I don't know if it is still needed.

> I'm beginning to think that the best default behaviour might be to
> only attempt tests with no OUTPUT directory. It means:

why not read the json ? JSON files has more info it will note if failure is due KVM err.  If it is KVM and P9 error retrying proved to useful.

for e.g I see 18 instances of of KVM errors over a year.
find  /home/build/results/ -iname "RESULT" | xargs  grep KVMERROR |wc -l
18

The last tiem I saw this KVMERROR was in Sept 2014. May be Fedora + swantest improvements fixed these issues.

2014-09-05-swantest.libreswan.fi-v3.10-52-g0a4ca86-hugh-2014aug/basic-pluto-04/OUTPUT/RESULT:{"node":"swantest.libreswan.fi","error":"KVMERROR not able to shutdown 1 guests: [north] abort","testname":"basic-pluto-04","epoch":1409929425.67,"result":"abort","time":"2014-09-05 18:03","runtime":65.25}


> - tests are only attempted once (so intermittent failures are not hidden)
> - re-running a test is easy - delete OUTPUT

I feel on a dedicated machine, where it continously run "make checl" retrys won't harm.

One mystery is why does Hugh's machines take so long. While 3 servers I run seems to finish "make check" in 5-7 hours. 

This pages show the data from Dec 2014. Jun - Dec my recollection is 5-6 hours.
http://hal.phenome.nl:8081/results/

-antony


More information about the Swan-dev mailing list