[Swan-dev] testing hangs for me

D. Hugh Redelmeier hugh at mimosa.com
Thu Jan 31 06:47:29 UTC 2019


| From: Andrew Cagney <andrew.cagney at gmail.com>

| The next one is try an older qemu.  But please don't ask me the magic
| needed to do that.

I haven't tried that nor even figured out how.

| I also may have reproduced this on a laptop - it's got a qemu process
| chewing through all the CPU.

I don't think all my cores are consumed in the hangs I've seen.  In the 
current hang, no process has used more than 10 minutes of CPU.  The hang 
has lasted over 6 hours.

I hoped that the problem was caused by test failures induced by my
bugs.  So I concentrated on fixing my bugs.

I ran the tests again, with at least one bug fixed, and the tests hung
after about 11 completed.

Here is the last bit of the console:
<<<<<<<<<<<<<<<<
b.runner basic-pluto-01-nokey 5:58.02: updated test results:
b.runner basic-pluto-01-nokey 5:58.02:   status/good/failed/previous=failed: 1
b.runner basic-pluto-01-nokey 5:58.02:   status/good/passed/previous=passed: 10
b.runner basic-pluto-01-nokey 5:58.02:   status/wip/ignored/status!=good: 4
b.runner basic-pluto-01-nokey 5:58.02:   total: 15
b.runner basic-pluto-01-nokey 5:58.02:   total/failed: 1
b.runner basic-pluto-01-nokey 5:58.02:   total/failed/good: 1
b.runner basic-pluto-01-nokey 5:58.02:   total/failed/good/output-different: 1
b.runner basic-pluto-01-nokey 5:58.02:   total/ignored: 4
b.runner basic-pluto-01-nokey 5:58.02:   total/passed: 10
b.runner basic-pluto-01-nokey 5:58.02:   total/passed/good: 10
b.runner basic-pluto-01-nokey 5:58.02: stop processing test basic-pluto-01-nokey (test 12 of 747) after 130.7 seconds
b.runner addconn-03 5:58.02: start processing test addconn-03 (test 17 of 747) at 2019-01-30 18:49:48.424044
b.runner addconn-03 5:58.03: ****** addconn-03 (test 17 of 747) started (previously passed) ....
b.runner addconn-03 5:58.03: moving contents of 'testing/pluto/addconn-03/OUTPUT' to 'BACKUP/2019-01-30-184350/addconn-03'
b.runner addconn-03 5:58.03: start testing addconn-03 (test 17 of 747) at 2019-01-30 18:49:48.531138
b.runner addconn-03 5:58.03/0.00: start booting domains at 2019-01-30 18:49:48.531220
b.runner addconn-03 5:58.03/0.00: 3 shutdown/reboot jobs ahead of us in the queue
>>>>>>>>>>>>>>>>

"ps laxgwf" showed:
<<<<<<<<<<<<<<<<
0  1105 16042 15295  20   0 528928 30844 -      Sl+  pts/3      0:15  |                       \_ python3 /home/build/libreswan/testing/utils/kvmrunner.py --prefix a. --prefix b. --workers 2 --publish-hash 4493009e3723cec6674fdf8b3491487dda06884d --publish-results ./RESULTS/v3.27-735-g4493009e3-master --publish-status ./RESULTS/status.json --test-status good testing/pluto
4     0 20026 16042  20   0      0     0 -      Zs   ?          0:00  |                           \_ [sudo] <defunct>
4     0 20038 16042  20   0      0     0 -      Zs   ?          0:00  |                           \_ [sudo] <defunct>
>>>>>>>>>>>>>>>>

"ls -ltr testing/pluto/*/OUTPUT/* | tail -4"

shows that one pair of test machines are still running, even though
the test runner doesn't think so.

The basic-pluto-01-nokey test passed, BTW.

<<<<<<<<<<<<<<<<
-rw-rw-r--. 1 build build       203 Jan 30 18:49 testing/pluto/basic-pluto-01-nokey/OUTPUT/RESULT
-rw-rw-r--. 1 build build       799 Jan 30 18:49 testing/pluto/addconn-03/OUTPUT/debug.log
-rwxrwxrwx. 1 build qemu     703313 Jan 31 01:12 testing/pluto/basic-pluto-01-nokey/OUTPUT/east.pluto.log
-rwxrwxrwx. 1 build qemu     766828 Jan 31 01:12 testing/pluto/basic-pluto-01-nokey/OUTPUT/west.pluto.log
>>>>>>>>>>>>>>>>

This time there were two defunct processes.

strace shows little.  A futex call is just sitting there:

<<<<<<<<<<<<<<<<
strace -p 16042
strace: Process 16042 attached
futex(0x5649470a68a0, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY^Cstrace: Process 16042 detached
>>>>>>>>>>>>>>>>

I fired up the gui virtual machine manager and shut down all running
machines.  I hoped that would get the test runner going again.  No
such luck.

Is there any way of looking into the python program's execution to see
what it thinks is going on?


More information about the Swan-dev mailing list